Title: [214887] trunk/Source
Revision
214887
Author
[email protected]
Date
2017-04-04 12:09:03 -0700 (Tue, 04 Apr 2017)

Log Message

Don't need to Air::reportUsedRegisters for wasm at -O1
https://bugs.webkit.org/show_bug.cgi?id=170459

Reviewed by Saam Barati.
        
Source/_javascript_Core:

I did some refactorings to Liveness<> to try to understand its performance. Based on
this I concluded that the bigger immediate issue is just removing unnecessary phases
from -O1.
        
This removes Air::reportUsedRegisters() from -O1 if the user has indicated that he is
not interested in StackmapGenerationParams::usedRegisters(). The logic here is a bit
weird because of how Air does spill code generation. The register allocator's spiller
will emit spill code using identifiable spill slots, which allows subsequent phases to
register-allocate the spill slots. We do this by a forward flow CSE phase called
fixObviousSpills (which is a terrible name since there is no longer anything obvious
about some of the spills that this phase can fix!). As is most natural for CSEs over
3AC, it rewires the uses of redundant computations rather than removing the redundant
computations. This means that if a spill got "fixed", there may be either or both of
the following:
        
- Dead loads from the stack.
- Dead stores to the stack.
        
We know that a load from the stack is dead if the register is dead at the point of the
load. We know that a store to the stack is dead if the spill slot is dead at the point
of the store.
        
Unfortunately, liveness analysis - over either registers or spill slots - is expensive.
        
Fortunately, allocateStack() already does liveness analysis over spill slots. So, we
baked elimination of stores to the stack into that phase. That aspect of clean-up after
the spill CSE comes for free.
        
Also fortunately for the FTL, we have to do reportUsedRegisters() anyway. This is a
phase that enables StackmapGenerationParams::usedRegisters() to work, which then
enables the FTL's patchpoints to do crazy slow-path live range splitting. So, Air's
strategy for the load fix-up after spill CSE is to do it as part of
reportUsedRegisters().
        
This patch introduces the Procedure::setNeedsUsedRegisters() API. But if you set
needsUsedRegisters to false then we will still run reportUsedRegisters() at -O2 as an
optimization - it removes dead loads from the stack that are left behind from
fixObviousSpills().
        
This is a ~6% compile time progression at -O1.

* b3/B3Procedure.h:
(JSC::B3::Procedure::setNeedsUsedRegisters):
(JSC::B3::Procedure::needsUsedRegisters):
* b3/B3StackmapGenerationParams.h:
* b3/B3VariableLiveness.cpp:
(JSC::B3::VariableLiveness::VariableLiveness):
* b3/air/AirCode.cpp:
(JSC::B3::Air::Code::needsUsedRegisters):
* b3/air/AirCode.h:
* b3/air/AirGenerate.cpp:
(JSC::B3::Air::prepareForGeneration):
* b3/air/AirLiveness.h:
(JSC::B3::Air::Liveness::Liveness):
* wasm/WasmB3IRGenerator.cpp:
(JSC::Wasm::parseAndCompile):

Source/WTF:

Just moved the liveness computation into a method, which enabled me to do the profiling
that I used to write this patch.

* wtf/Liveness.h:
(WTF::Liveness::Liveness):
(WTF::Liveness::compute):

Modified Paths

Diff

Modified: trunk/Source/_javascript_Core/ChangeLog (214886 => 214887)


--- trunk/Source/_javascript_Core/ChangeLog	2017-04-04 19:06:46 UTC (rev 214886)
+++ trunk/Source/_javascript_Core/ChangeLog	2017-04-04 19:09:03 UTC (rev 214887)
@@ -1,3 +1,67 @@
+2017-04-04  Filip Pizlo  <[email protected]>
+
+        Don't need to Air::reportUsedRegisters for wasm at -O1
+        https://bugs.webkit.org/show_bug.cgi?id=170459
+
+        Reviewed by Saam Barati.
+        
+        I did some refactorings to Liveness<> to try to understand its performance. Based on
+        this I concluded that the bigger immediate issue is just removing unnecessary phases
+        from -O1.
+        
+        This removes Air::reportUsedRegisters() from -O1 if the user has indicated that he is
+        not interested in StackmapGenerationParams::usedRegisters(). The logic here is a bit
+        weird because of how Air does spill code generation. The register allocator's spiller
+        will emit spill code using identifiable spill slots, which allows subsequent phases to
+        register-allocate the spill slots. We do this by a forward flow CSE phase called
+        fixObviousSpills (which is a terrible name since there is no longer anything obvious
+        about some of the spills that this phase can fix!). As is most natural for CSEs over
+        3AC, it rewires the uses of redundant computations rather than removing the redundant
+        computations. This means that if a spill got "fixed", there may be either or both of
+        the following:
+        
+        - Dead loads from the stack.
+        - Dead stores to the stack.
+        
+        We know that a load from the stack is dead if the register is dead at the point of the
+        load. We know that a store to the stack is dead if the spill slot is dead at the point
+        of the store.
+        
+        Unfortunately, liveness analysis - over either registers or spill slots - is expensive.
+        
+        Fortunately, allocateStack() already does liveness analysis over spill slots. So, we
+        baked elimination of stores to the stack into that phase. That aspect of clean-up after
+        the spill CSE comes for free.
+        
+        Also fortunately for the FTL, we have to do reportUsedRegisters() anyway. This is a
+        phase that enables StackmapGenerationParams::usedRegisters() to work, which then
+        enables the FTL's patchpoints to do crazy slow-path live range splitting. So, Air's
+        strategy for the load fix-up after spill CSE is to do it as part of
+        reportUsedRegisters().
+        
+        This patch introduces the Procedure::setNeedsUsedRegisters() API. But if you set
+        needsUsedRegisters to false then we will still run reportUsedRegisters() at -O2 as an
+        optimization - it removes dead loads from the stack that are left behind from
+        fixObviousSpills().
+        
+        This is a ~6% compile time progression at -O1.
+
+        * b3/B3Procedure.h:
+        (JSC::B3::Procedure::setNeedsUsedRegisters):
+        (JSC::B3::Procedure::needsUsedRegisters):
+        * b3/B3StackmapGenerationParams.h:
+        * b3/B3VariableLiveness.cpp:
+        (JSC::B3::VariableLiveness::VariableLiveness):
+        * b3/air/AirCode.cpp:
+        (JSC::B3::Air::Code::needsUsedRegisters):
+        * b3/air/AirCode.h:
+        * b3/air/AirGenerate.cpp:
+        (JSC::B3::Air::prepareForGeneration):
+        * b3/air/AirLiveness.h:
+        (JSC::B3::Air::Liveness::Liveness):
+        * wasm/WasmB3IRGenerator.cpp:
+        (JSC::Wasm::parseAndCompile):
+
 2017-04-03  Filip Pizlo  <[email protected]>
 
         Air liveness should build constraints and solve them rather than repeatedly parsing IR

Modified: trunk/Source/_javascript_Core/b3/B3Procedure.h (214886 => 214887)


--- trunk/Source/_javascript_Core/b3/B3Procedure.h	2017-04-04 19:06:46 UTC (rev 214886)
+++ trunk/Source/_javascript_Core/b3/B3Procedure.h	2017-04-04 19:09:03 UTC (rev 214887)
@@ -232,6 +232,12 @@
 
     // This tells the register allocators to stay away from this register.
     JS_EXPORT_PRIVATE void pinRegister(Reg);
+    
+    // You can turn off used registers calculation. This may speed up compilation a bit. But if
+    // you turn it off then you cannot use StackmapGenerationParams::usedRegisters() or
+    // StackmapGenerationParams::unavailableRegisters().
+    void setNeedsUsedRegisters(bool value) { m_needsUsedRegisters = value; }
+    bool needsUsedRegisters() const { return m_needsUsedRegisters; }
 
     JS_EXPORT_PRIVATE unsigned frameSize() const;
     JS_EXPORT_PRIVATE const RegisterAtOffsetList& calleeSaveRegisters() const;
@@ -267,6 +273,7 @@
     RefPtr<SharedTask<void(PrintStream&, Origin)>> m_originPrinter;
     const void* m_frontendData;
     PCToOriginMap m_pcToOriginMap;
+    bool m_needsUsedRegisters { true };
     bool m_hasQuirks { false };
 };
 

Modified: trunk/Source/_javascript_Core/b3/B3StackmapGenerationParams.cpp (214886 => 214887)


--- trunk/Source/_javascript_Core/b3/B3StackmapGenerationParams.cpp	2017-04-04 19:06:46 UTC (rev 214886)
+++ trunk/Source/_javascript_Core/b3/B3StackmapGenerationParams.cpp	2017-04-04 19:09:03 UTC (rev 214887)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
+ * Copyright (C) 2015-2017 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
@@ -38,6 +38,8 @@
 
 const RegisterSet& StackmapGenerationParams::usedRegisters() const
 {
+    ASSERT(m_context.code->needsUsedRegisters());
+    
     return m_value->m_usedRegisters;
 }
 

Modified: trunk/Source/_javascript_Core/b3/B3StackmapGenerationParams.h (214886 => 214887)


--- trunk/Source/_javascript_Core/b3/B3StackmapGenerationParams.h	2017-04-04 19:06:46 UTC (rev 214886)
+++ trunk/Source/_javascript_Core/b3/B3StackmapGenerationParams.h	2017-04-04 19:09:03 UTC (rev 214887)
@@ -60,6 +60,7 @@
     Vector<ValueRep>::const_iterator end() const { return m_reps.end(); }
     
     // This tells you the registers that were used.
+    // NOTE: This will report bogus information if you did proc.setNeedsUsedRegisters(false).
     const RegisterSet& usedRegisters() const;
 
     // This is a useful helper if you want to do register allocation inside of a patchpoint. The
@@ -79,6 +80,8 @@
     //
     // I.e. it is like usedRegisters() but also includes unsaved callee-saves and excludes scratch
     // registers.
+    //
+    // NOTE: This will report bogus information if you did proc.setNeedsUsedRegisters(false).
     JS_EXPORT_PRIVATE RegisterSet unavailableRegisters() const;
 
     GPRReg gpScratch(unsigned index) const { return m_gpScratch[index]; }

Modified: trunk/Source/_javascript_Core/b3/B3VariableLiveness.cpp (214886 => 214887)


--- trunk/Source/_javascript_Core/b3/B3VariableLiveness.cpp	2017-04-04 19:06:46 UTC (rev 214886)
+++ trunk/Source/_javascript_Core/b3/B3VariableLiveness.cpp	2017-04-04 19:09:03 UTC (rev 214887)
@@ -33,6 +33,7 @@
 VariableLiveness::VariableLiveness(Procedure& proc)
     : WTF::Liveness<VariableLivenessAdapter>(proc.cfg(), proc)
 {
+    compute();
 }
 
 VariableLiveness::~VariableLiveness()

Modified: trunk/Source/_javascript_Core/b3/air/AirCode.cpp (214886 => 214887)


--- trunk/Source/_javascript_Core/b3/air/AirCode.cpp	2017-04-04 19:06:46 UTC (rev 214886)
+++ trunk/Source/_javascript_Core/b3/air/AirCode.cpp	2017-04-04 19:09:03 UTC (rev 214887)
@@ -87,6 +87,11 @@
     ASSERT(!regs.contains(reg));
 }
 
+bool Code::needsUsedRegisters() const
+{
+    return m_proc.needsUsedRegisters();
+}
+
 BasicBlock* Code::addBlock(double frequency)
 {
     std::unique_ptr<BasicBlock> block(new BasicBlock(m_blocks.size(), frequency));

Modified: trunk/Source/_javascript_Core/b3/air/AirCode.h (214886 => 214887)


--- trunk/Source/_javascript_Core/b3/air/AirCode.h	2017-04-04 19:06:46 UTC (rev 214886)
+++ trunk/Source/_javascript_Core/b3/air/AirCode.h	2017-04-04 19:09:03 UTC (rev 214887)
@@ -90,6 +90,8 @@
     bool isPinned(Reg reg) const { return !mutableRegs().get(reg); }
     
     void pinRegister(Reg);
+    
+    bool needsUsedRegisters() const;
 
     JS_EXPORT_PRIVATE BasicBlock* addBlock(double frequency = 1);
 

Modified: trunk/Source/_javascript_Core/b3/air/AirGenerate.cpp (214886 => 214887)


--- trunk/Source/_javascript_Core/b3/air/AirGenerate.cpp	2017-04-04 19:06:46 UTC (rev 214886)
+++ trunk/Source/_javascript_Core/b3/air/AirGenerate.cpp	2017-04-04 19:09:03 UTC (rev 214887)
@@ -122,8 +122,10 @@
     // phase.
     simplifyCFG(code);
 
-    // This is needed to satisfy a requirement of B3::StackmapValue.
-    reportUsedRegisters(code);
+    // This is needed to satisfy a requirement of B3::StackmapValue. This also removes dead
+    // code. We can avoid running this when certain optimizations are disabled.
+    if (optLevel >= 2 || code.needsUsedRegisters())
+        reportUsedRegisters(code);
 
     // Attempt to remove false dependencies between instructions created by partial register changes.
     // This must be executed as late as possible as it depends on the instructions order and register

Modified: trunk/Source/_javascript_Core/b3/air/AirLiveness.h (214886 => 214887)


--- trunk/Source/_javascript_Core/b3/air/AirLiveness.h	2017-04-04 19:06:46 UTC (rev 214886)
+++ trunk/Source/_javascript_Core/b3/air/AirLiveness.h	2017-04-04 19:09:03 UTC (rev 214887)
@@ -38,6 +38,8 @@
     Liveness(Code& code)
         : WTF::Liveness<Adapter>(code.cfg(), code)
     {
+        SuperSamplerScope samplingScope(false);
+        WTF::Liveness<Adapter>::compute();
     }
 };
 

Modified: trunk/Source/_javascript_Core/wasm/WasmB3IRGenerator.cpp (214886 => 214887)


--- trunk/Source/_javascript_Core/wasm/WasmB3IRGenerator.cpp	2017-04-04 19:06:46 UTC (rev 214886)
+++ trunk/Source/_javascript_Core/wasm/WasmB3IRGenerator.cpp	2017-04-04 19:09:03 UTC (rev 214887)
@@ -1282,6 +1282,12 @@
         if (origin.data())
             out.print("Wasm: ", bitwise_cast<OpcodeOrigin>(origin));
     });
+    
+    // This means we cannot use either StackmapGenerationParams::usedRegisters() or
+    // StackmapGenerationParams::unavailableRegisters(). In exchange for this concession, we
+    // don't strictly need to run Air::reportUsedRegisters(), which saves a bit of CPU time at
+    // optLevel=1.
+    procedure.setNeedsUsedRegisters(false);
 
     B3IRGenerator context(info, procedure, result.get(), unlinkedWasmToWasmCalls, mode);
     FunctionParser<B3IRGenerator> parser(context, functionStart, functionLength, signature, info, moduleSignatureIndicesToUniquedSignatureIndices);
@@ -1296,7 +1302,7 @@
     dataLogIf(verbose, "Pre SSA: ", procedure);
     fixSSA(procedure);
     dataLogIf(verbose, "Post SSA: ", procedure);
-
+    
     {
         B3::prepareForGeneration(procedure, optLevel);
         B3::generate(procedure, *compilationContext.wasmEntrypointJIT);

Modified: trunk/Source/WTF/ChangeLog (214886 => 214887)


--- trunk/Source/WTF/ChangeLog	2017-04-04 19:06:46 UTC (rev 214886)
+++ trunk/Source/WTF/ChangeLog	2017-04-04 19:09:03 UTC (rev 214887)
@@ -1,3 +1,17 @@
+2017-04-04  Filip Pizlo  <[email protected]>
+
+        Don't need to Air::reportUsedRegisters for wasm at -O1
+        https://bugs.webkit.org/show_bug.cgi?id=170459
+
+        Reviewed by Saam Barati.
+        
+        Just moved the liveness computation into a method, which enabled me to do the profiling
+        that I used to write this patch.
+
+        * wtf/Liveness.h:
+        (WTF::Liveness::Liveness):
+        (WTF::Liveness::compute):
+
 2017-04-03  Filip Pizlo  <[email protected]>
 
         Air liveness should build constraints and solve them rather than repeatedly parsing IR

Modified: trunk/Source/WTF/wtf/Liveness.h (214886 => 214887)


--- trunk/Source/WTF/wtf/Liveness.h	2017-04-04 19:06:46 UTC (rev 214886)
+++ trunk/Source/WTF/wtf/Liveness.h	2017-04-04 19:09:03 UTC (rev 214887)
@@ -49,103 +49,8 @@
         , m_liveAtHead(cfg.template newMap<IndexVector>())
         , m_liveAtTail(cfg.template newMap<IndexVector>())
     {
-        // The liveAtTail of each block automatically contains the LateUse's of the terminal.
-        for (unsigned blockIndex = m_cfg.numNodes(); blockIndex--;) {
-            typename CFG::Node block = m_cfg.node(blockIndex);
-            if (!block)
-                continue;
-            
-            IndexVector& liveAtTail = m_liveAtTail[block];
-
-            Adapter::forEachUse(
-                block, Adapter::blockSize(block),
-                [&] (unsigned index) {
-                    liveAtTail.append(index);
-                });
-            
-            std::sort(liveAtTail.begin(), liveAtTail.end());
-            removeRepeatedElements(liveAtTail);
-        }
-
-        // Blocks with new live values at tail.
-        BitVector dirtyBlocks;
-        for (size_t blockIndex = m_cfg.numNodes(); blockIndex--;)
-            dirtyBlocks.set(blockIndex);
-        
-        IndexVector mergeBuffer;
-        
-        bool changed;
-        do {
-            changed = false;
-
-            for (size_t blockIndex = m_cfg.numNodes(); blockIndex--;) {
-                typename CFG::Node block = m_cfg.node(blockIndex);
-                if (!block)
-                    continue;
-
-                if (!dirtyBlocks.quickClear(blockIndex))
-                    continue;
-
-                LocalCalc localCalc(*this, block);
-                for (size_t instIndex = Adapter::blockSize(block); instIndex--;)
-                    localCalc.execute(instIndex);
-
-                // Handle the early def's of the first instruction.
-                Adapter::forEachDef(
-                    block, 0,
-                    [&] (unsigned index) {
-                        m_workset.remove(index);
-                    });
-
-                IndexVector& liveAtHead = m_liveAtHead[block];
-
-                // We only care about Tmps that were discovered in this iteration. It is impossible
-                // to remove a live value from the head.
-                // We remove all the values we already knew about so that we only have to deal with
-                // what is new in LiveAtHead.
-                if (m_workset.size() == liveAtHead.size())
-                    m_workset.clear();
-                else {
-                    for (unsigned liveIndexAtHead : liveAtHead)
-                        m_workset.remove(liveIndexAtHead);
-                }
-
-                if (m_workset.isEmpty())
-                    continue;
-
-                liveAtHead.reserveCapacity(liveAtHead.size() + m_workset.size());
-                for (unsigned newValue : m_workset)
-                    liveAtHead.uncheckedAppend(newValue);
-                
-                m_workset.sort();
-                
-                for (typename CFG::Node predecessor : m_cfg.predecessors(block)) {
-                    IndexVector& liveAtTail = m_liveAtTail[predecessor];
-                    
-                    if (liveAtTail.isEmpty())
-                        liveAtTail = m_workset.values();
-                    else {
-                        mergeBuffer.resize(liveAtTail.size() + m_workset.size());
-                        auto iter = mergeDeduplicatedSorted(
-                            liveAtTail.begin(), liveAtTail.end(),
-                            m_workset.begin(), m_workset.end(),
-                            mergeBuffer.begin());
-                        mergeBuffer.resize(iter - mergeBuffer.begin());
-                        
-                        if (mergeBuffer.size() == liveAtTail.size())
-                            continue;
-                    
-                        RELEASE_ASSERT(mergeBuffer.size() > liveAtTail.size());
-                        liveAtTail = mergeBuffer;
-                    }
-                    
-                    dirtyBlocks.quickSet(predecessor->index());
-                    changed = true;
-                }
-            }
-        } while (changed);
     }
-
+    
     // This calculator has to be run in reverse.
     class LocalCalc {
     public:
@@ -349,6 +254,106 @@
     
     LiveAtHead liveAtHead() { return LiveAtHead(*this); }
 
+protected:
+    void compute()
+    {
+        // The liveAtTail of each block automatically contains the LateUse's of the terminal.
+        for (unsigned blockIndex = m_cfg.numNodes(); blockIndex--;) {
+            typename CFG::Node block = m_cfg.node(blockIndex);
+            if (!block)
+                continue;
+            
+            IndexVector& liveAtTail = m_liveAtTail[block];
+
+            Adapter::forEachUse(
+                block, Adapter::blockSize(block),
+                [&] (unsigned index) {
+                    liveAtTail.append(index);
+                });
+            
+            std::sort(liveAtTail.begin(), liveAtTail.end());
+            removeRepeatedElements(liveAtTail);
+        }
+
+        // Blocks with new live values at tail.
+        BitVector dirtyBlocks;
+        for (size_t blockIndex = m_cfg.numNodes(); blockIndex--;)
+            dirtyBlocks.set(blockIndex);
+        
+        IndexVector mergeBuffer;
+        
+        bool changed;
+        do {
+            changed = false;
+
+            for (size_t blockIndex = m_cfg.numNodes(); blockIndex--;) {
+                typename CFG::Node block = m_cfg.node(blockIndex);
+                if (!block)
+                    continue;
+
+                if (!dirtyBlocks.quickClear(blockIndex))
+                    continue;
+
+                LocalCalc localCalc(*this, block);
+                for (size_t instIndex = Adapter::blockSize(block); instIndex--;)
+                    localCalc.execute(instIndex);
+
+                // Handle the early def's of the first instruction.
+                Adapter::forEachDef(
+                    block, 0,
+                    [&] (unsigned index) {
+                        m_workset.remove(index);
+                    });
+
+                IndexVector& liveAtHead = m_liveAtHead[block];
+
+                // We only care about Tmps that were discovered in this iteration. It is impossible
+                // to remove a live value from the head.
+                // We remove all the values we already knew about so that we only have to deal with
+                // what is new in LiveAtHead.
+                if (m_workset.size() == liveAtHead.size())
+                    m_workset.clear();
+                else {
+                    for (unsigned liveIndexAtHead : liveAtHead)
+                        m_workset.remove(liveIndexAtHead);
+                }
+
+                if (m_workset.isEmpty())
+                    continue;
+
+                liveAtHead.reserveCapacity(liveAtHead.size() + m_workset.size());
+                for (unsigned newValue : m_workset)
+                    liveAtHead.uncheckedAppend(newValue);
+                
+                m_workset.sort();
+                
+                for (typename CFG::Node predecessor : m_cfg.predecessors(block)) {
+                    IndexVector& liveAtTail = m_liveAtTail[predecessor];
+                    
+                    if (liveAtTail.isEmpty())
+                        liveAtTail = m_workset.values();
+                    else {
+                        mergeBuffer.resize(liveAtTail.size() + m_workset.size());
+                        auto iter = mergeDeduplicatedSorted(
+                            liveAtTail.begin(), liveAtTail.end(),
+                            m_workset.begin(), m_workset.end(),
+                            mergeBuffer.begin());
+                        mergeBuffer.resize(iter - mergeBuffer.begin());
+                        
+                        if (mergeBuffer.size() == liveAtTail.size())
+                            continue;
+                    
+                        RELEASE_ASSERT(mergeBuffer.size() > liveAtTail.size());
+                        liveAtTail = mergeBuffer;
+                    }
+                    
+                    dirtyBlocks.quickSet(predecessor->index());
+                    changed = true;
+                }
+            }
+        } while (changed);
+    }
+
 private:
     friend class LocalCalc;
     friend class LocalCalc::Iterable;
_______________________________________________
webkit-changes mailing list
[email protected]
https://lists.webkit.org/mailman/listinfo/webkit-changes

Reply via email to