Title: [278576] trunk/Source/_javascript_Core
Revision
278576
Author
[email protected]
Date
2021-06-07 15:51:25 -0700 (Mon, 07 Jun 2021)

Log Message

Put the Baseline JIT prologue and op_loop_hint code in JIT thunks.
https://bugs.webkit.org/show_bug.cgi?id=226375

Reviewed by Keith Miller and Robin Morisset.

Baseline JIT prologue code varies in behavior based on several variables.  These
variables include (1) whether the prologue does any arguments value profiling,
(2) whether the prologue is for a constructor, and (3) whether the compiled
CodeBlock will have such a large frame that it is greater than the stack reserved
zone (aka red zone) which would require additional stack check logic.

The pre-existing code would generate specialized code based on these (and other
variables).  In converting to using thunks for the prologue, we opt not to
convert these specializations into runtime checks.  Instead, the implementation
uses 1 of 8 possible specialized thunks to reduce the need to pass arguments for
runtime checks.  The only needed argument passed to the prologue thunks is the
codeBlock pointer.

There are 8 possible thunks because we specialize based on 3 variables:
1. doesProfiling
2. isConstructor
3. hasHugeFrame

2**3 yields 8 permutations of prologue thunk specializations.

Similarly, there are also 8 analogous arity fixup prologues that work similarly.

The op_loop_hint thunk only takes 1 runtime argument: the bytecode offset.

We've tried doing the loop_hint optimization check in the thunk (in order to move
both the fast and slow path into the thunk for maximum space savings).  However,
this seems to have some slight negative impact on benchmark performance.  We ended
up just keeping the fast path and instead have the slow path call a thunk to do
its work.  This realizes the bulk of the size savings without the perf impact.

This patch also optimizes op_enter a bit more by eliminating the need to pass any
arguments to the thunk.  The thunk previously took 2 arguments: localsToInit and
canBeOptimized.  localsToInit is now computed in the thunk at runtime, and
canBeOptimized is used as a specialization argument to generate 2 variants of the
op_enter thunk: op_enter_canBeOptimized_Generator and op_enter_cannotBeOptimized_Generator,
thereby removing the need to pass it as a runtime argument.

LinkBuffer size results (from a single run of Speedometer2):

   BaselineJIT: 93319628 (88.996532 MB)   => 83851824 (79.967331 MB)   0.90x
 ExtraCTIThunk: 5992 (5.851562 KB)        => 6984 (6.820312 KB)        1.17x
                ...
         Total: 197530008 (188.379295 MB) => 188459444 (179.728931 MB) 0.95x

Speedometer2 and JetStream2 results (as measured on an M1 Mac) are neutral.

* assembler/AbstractMacroAssembler.h:
(JSC::AbstractMacroAssembler::untagReturnAddressWithoutExtraValidation):
* assembler/MacroAssemblerARM64E.h:
(JSC::MacroAssemblerARM64E::untagReturnAddress):
(JSC::MacroAssemblerARM64E::untagReturnAddressWithoutExtraValidation):
* assembler/MacroAssemblerARMv7.h:
(JSC::MacroAssemblerARMv7::branchAdd32):
* assembler/MacroAssemblerMIPS.h:
(JSC::MacroAssemblerMIPS::branchAdd32):
* bytecode/CodeBlock.h:
(JSC::CodeBlock::offsetOfNumCalleeLocals):
(JSC::CodeBlock::offsetOfNumVars):
(JSC::CodeBlock::offsetOfArgumentValueProfiles):
(JSC::CodeBlock::offsetOfShouldAlwaysBeInlined):
* jit/AssemblyHelpers.h:
(JSC::AssemblyHelpers::emitSaveCalleeSavesFor):
(JSC::AssemblyHelpers::emitSaveCalleeSavesForBaselineJIT):
(JSC::AssemblyHelpers::emitRestoreCalleeSavesForBaselineJIT):
* jit/JIT.cpp:
(JSC::JIT::compileAndLinkWithoutFinalizing):
(JSC::JIT::prologueGenerator):
(JSC::JIT::arityFixupPrologueGenerator):
(JSC::JIT::privateCompileExceptionHandlers):
* jit/JIT.h:
* jit/JITInlines.h:
(JSC::JIT::emitNakedNearCall):
* jit/JITOpcodes.cpp:
(JSC::JIT::op_ret_handlerGenerator):
(JSC::JIT::emit_op_enter):
(JSC::JIT::op_enter_Generator):
(JSC::JIT::op_enter_canBeOptimized_Generator):
(JSC::JIT::op_enter_cannotBeOptimized_Generator):
(JSC::JIT::emit_op_loop_hint):
(JSC::JIT::emitSlow_op_loop_hint):
(JSC::JIT::op_loop_hint_Generator):
(JSC::JIT::op_enter_handlerGenerator): Deleted.
* jit/JITOpcodes32_64.cpp:
(JSC::JIT::emit_op_enter):
* jit/ThunkGenerators.cpp:
(JSC::popThunkStackPreservesAndHandleExceptionGenerator):

Modified Paths

Diff

Modified: trunk/Source/_javascript_Core/ChangeLog (278575 => 278576)


--- trunk/Source/_javascript_Core/ChangeLog	2021-06-07 22:11:29 UTC (rev 278575)
+++ trunk/Source/_javascript_Core/ChangeLog	2021-06-07 22:51:25 UTC (rev 278576)
@@ -1,3 +1,97 @@
+2021-06-07  Mark Lam  <[email protected]>
+
+        Put the Baseline JIT prologue and op_loop_hint code in JIT thunks.
+        https://bugs.webkit.org/show_bug.cgi?id=226375
+
+        Reviewed by Keith Miller and Robin Morisset.
+
+        Baseline JIT prologue code varies in behavior based on several variables.  These
+        variables include (1) whether the prologue does any arguments value profiling,
+        (2) whether the prologue is for a constructor, and (3) whether the compiled
+        CodeBlock will have such a large frame that it is greater than the stack reserved
+        zone (aka red zone) which would require additional stack check logic.
+
+        The pre-existing code would generate specialized code based on these (and other
+        variables).  In converting to using thunks for the prologue, we opt not to
+        convert these specializations into runtime checks.  Instead, the implementation
+        uses 1 of 8 possible specialized thunks to reduce the need to pass arguments for
+        runtime checks.  The only needed argument passed to the prologue thunks is the
+        codeBlock pointer.
+
+        There are 8 possible thunks because we specialize based on 3 variables:
+        1. doesProfiling
+        2. isConstructor
+        3. hasHugeFrame
+
+        2**3 yields 8 permutations of prologue thunk specializations.
+
+        Similarly, there are also 8 analogous arity fixup prologues that work similarly.
+
+        The op_loop_hint thunk only takes 1 runtime argument: the bytecode offset.
+
+        We've tried doing the loop_hint optimization check in the thunk (in order to move
+        both the fast and slow path into the thunk for maximum space savings).  However,
+        this seems to have some slight negative impact on benchmark performance.  We ended
+        up just keeping the fast path and instead have the slow path call a thunk to do
+        its work.  This realizes the bulk of the size savings without the perf impact.
+
+        This patch also optimizes op_enter a bit more by eliminating the need to pass any
+        arguments to the thunk.  The thunk previously took 2 arguments: localsToInit and
+        canBeOptimized.  localsToInit is now computed in the thunk at runtime, and
+        canBeOptimized is used as a specialization argument to generate 2 variants of the
+        op_enter thunk: op_enter_canBeOptimized_Generator and op_enter_cannotBeOptimized_Generator,
+        thereby removing the need to pass it as a runtime argument.
+
+        LinkBuffer size results (from a single run of Speedometer2):
+
+           BaselineJIT: 93319628 (88.996532 MB)   => 83851824 (79.967331 MB)   0.90x
+         ExtraCTIThunk: 5992 (5.851562 KB)        => 6984 (6.820312 KB)        1.17x
+                        ...
+                 Total: 197530008 (188.379295 MB) => 188459444 (179.728931 MB) 0.95x
+
+        Speedometer2 and JetStream2 results (as measured on an M1 Mac) are neutral.
+
+        * assembler/AbstractMacroAssembler.h:
+        (JSC::AbstractMacroAssembler::untagReturnAddressWithoutExtraValidation):
+        * assembler/MacroAssemblerARM64E.h:
+        (JSC::MacroAssemblerARM64E::untagReturnAddress):
+        (JSC::MacroAssemblerARM64E::untagReturnAddressWithoutExtraValidation):
+        * assembler/MacroAssemblerARMv7.h:
+        (JSC::MacroAssemblerARMv7::branchAdd32):
+        * assembler/MacroAssemblerMIPS.h:
+        (JSC::MacroAssemblerMIPS::branchAdd32):
+        * bytecode/CodeBlock.h:
+        (JSC::CodeBlock::offsetOfNumCalleeLocals):
+        (JSC::CodeBlock::offsetOfNumVars):
+        (JSC::CodeBlock::offsetOfArgumentValueProfiles):
+        (JSC::CodeBlock::offsetOfShouldAlwaysBeInlined):
+        * jit/AssemblyHelpers.h:
+        (JSC::AssemblyHelpers::emitSaveCalleeSavesFor):
+        (JSC::AssemblyHelpers::emitSaveCalleeSavesForBaselineJIT):
+        (JSC::AssemblyHelpers::emitRestoreCalleeSavesForBaselineJIT):
+        * jit/JIT.cpp:
+        (JSC::JIT::compileAndLinkWithoutFinalizing):
+        (JSC::JIT::prologueGenerator):
+        (JSC::JIT::arityFixupPrologueGenerator):
+        (JSC::JIT::privateCompileExceptionHandlers):
+        * jit/JIT.h:
+        * jit/JITInlines.h:
+        (JSC::JIT::emitNakedNearCall):
+        * jit/JITOpcodes.cpp:
+        (JSC::JIT::op_ret_handlerGenerator):
+        (JSC::JIT::emit_op_enter):
+        (JSC::JIT::op_enter_Generator):
+        (JSC::JIT::op_enter_canBeOptimized_Generator):
+        (JSC::JIT::op_enter_cannotBeOptimized_Generator):
+        (JSC::JIT::emit_op_loop_hint):
+        (JSC::JIT::emitSlow_op_loop_hint):
+        (JSC::JIT::op_loop_hint_Generator):
+        (JSC::JIT::op_enter_handlerGenerator): Deleted.
+        * jit/JITOpcodes32_64.cpp:
+        (JSC::JIT::emit_op_enter):
+        * jit/ThunkGenerators.cpp:
+        (JSC::popThunkStackPreservesAndHandleExceptionGenerator):
+
 2021-06-07  Robin Morisset  <[email protected]>
 
         Optimize compareStrictEq when neither side is a double and at least one is neither a string nor a BigInt

Modified: trunk/Source/_javascript_Core/assembler/AbstractMacroAssembler.h (278575 => 278576)


--- trunk/Source/_javascript_Core/assembler/AbstractMacroAssembler.h	2021-06-07 22:11:29 UTC (rev 278575)
+++ trunk/Source/_javascript_Core/assembler/AbstractMacroAssembler.h	2021-06-07 22:51:25 UTC (rev 278576)
@@ -1003,6 +1003,7 @@
 
     ALWAYS_INLINE void tagReturnAddress() { }
     ALWAYS_INLINE void untagReturnAddress(RegisterID = RegisterID::InvalidGPRReg) { }
+    ALWAYS_INLINE void untagReturnAddressWithoutExtraValidation() { }
 
     ALWAYS_INLINE void tagPtr(PtrTag, RegisterID) { }
     ALWAYS_INLINE void tagPtr(RegisterID, RegisterID) { }

Modified: trunk/Source/_javascript_Core/assembler/MacroAssemblerARM64E.h (278575 => 278576)


--- trunk/Source/_javascript_Core/assembler/MacroAssemblerARM64E.h	2021-06-07 22:11:29 UTC (rev 278575)
+++ trunk/Source/_javascript_Core/assembler/MacroAssemblerARM64E.h	2021-06-07 22:51:25 UTC (rev 278576)
@@ -54,10 +54,15 @@
 
     ALWAYS_INLINE void untagReturnAddress(RegisterID scratch = InvalidGPR)
     {
-        untagPtr(ARM64Registers::sp, ARM64Registers::lr);
+        untagReturnAddressWithoutExtraValidation();
         validateUntaggedPtr(ARM64Registers::lr, scratch);
     }
 
+    ALWAYS_INLINE void untagReturnAddressWithoutExtraValidation()
+    {
+        untagPtr(ARM64Registers::sp, ARM64Registers::lr);
+    }
+
     ALWAYS_INLINE void tagPtr(PtrTag tag, RegisterID target)
     {
         auto tagGPR = getCachedDataTempRegisterIDAndInvalidate();

Modified: trunk/Source/_javascript_Core/assembler/MacroAssemblerARMv7.h (278575 => 278576)


--- trunk/Source/_javascript_Core/assembler/MacroAssemblerARMv7.h	2021-06-07 22:11:29 UTC (rev 278575)
+++ trunk/Source/_javascript_Core/assembler/MacroAssemblerARMv7.h	2021-06-07 22:51:25 UTC (rev 278576)
@@ -1795,6 +1795,23 @@
         return branchAdd32(cond, dest, imm, dest);
     }
 
+    Jump branchAdd32(ResultCondition cond, TrustedImm32 imm, Address dest)
+    {
+        load32(dest, dataTempRegister);
+
+        // Do the add.
+        ARMThumbImmediate armImm = ARMThumbImmediate::makeEncodedImm(imm.m_value);
+        if (armImm.isValid())
+            m_assembler.add_S(dataTempRegister, dataTempRegister, armImm);
+        else {
+            move(imm, addressTempRegister);
+            m_assembler.add_S(dataTempRegister, dataTempRegister, addressTempRegister);
+        }
+
+        store32(dataTempRegister, dest);
+        return Jump(makeBranch(cond));
+    }
+
     Jump branchAdd32(ResultCondition cond, TrustedImm32 imm, AbsoluteAddress dest)
     {
         // Move the high bits of the address into addressTempRegister,

Modified: trunk/Source/_javascript_Core/assembler/MacroAssemblerMIPS.h (278575 => 278576)


--- trunk/Source/_javascript_Core/assembler/MacroAssemblerMIPS.h	2021-06-07 22:11:29 UTC (rev 278575)
+++ trunk/Source/_javascript_Core/assembler/MacroAssemblerMIPS.h	2021-06-07 22:51:25 UTC (rev 278576)
@@ -2310,6 +2310,111 @@
         return Jump();
     }
 
+    Jump branchAdd32(ResultCondition cond, TrustedImm32 imm, ImplicitAddress destAddress)
+    {
+        bool useAddrTempRegister = !(destAddress.offset >= -32768 && destAddress.offset <= 32767
+            && !m_fixedWidth);
+
+        if (useAddrTempRegister) {
+            m_assembler.lui(addrTempRegister, (destAddress.offset + 0x8000) >> 16);
+            m_assembler.addu(addrTempRegister, addrTempRegister, destAddress.base);
+        }
+
+        auto loadDest = [&] (RegisterID dest) {
+            if (useAddrTempRegister)
+                m_assembler.lw(dest, addrTempRegister, destAddress.offset);
+            else
+                m_assembler.lw(dest, destAddress.base, destAddress.offset);
+        };
+
+        auto storeDest = [&] (RegisterID src) {
+            if (useAddrTempRegister)
+                m_assembler.sw(src, addrTempRegister, destAddress.offset);
+            else
+                m_assembler.sw(src, destAddress.base, destAddress.offset);
+        };
+
+        ASSERT((cond == Overflow) || (cond == Signed) || (cond == PositiveOrZero) || (cond == Zero) || (cond == NonZero));
+        if (cond == Overflow) {
+            if (m_fixedWidth) {
+                /*
+                    load    dest, dataTemp
+                    move    imm, immTemp
+                    xor     cmpTemp, dataTemp, immTemp
+                    addu    dataTemp, dataTemp, immTemp
+                    store   dataTemp, dest
+                    bltz    cmpTemp, No_overflow    # diff sign bit -> no overflow
+                    xor     cmpTemp, dataTemp, immTemp
+                    bgez    cmpTemp, No_overflow    # same sign big -> no overflow
+                    nop
+                    b       Overflow
+                    nop
+                    b       No_overflow
+                    nop
+                    nop
+                    nop
+                No_overflow:
+                */
+                loadDest(dataTempRegister);
+                move(imm, immTempRegister);
+                m_assembler.xorInsn(cmpTempRegister, dataTempRegister, immTempRegister);
+                m_assembler.addu(dataTempRegister, dataTempRegister, immTempRegister);
+                storeDest(dataTempRegister);
+                m_assembler.bltz(cmpTempRegister, 9);
+                m_assembler.xorInsn(cmpTempRegister, dataTempRegister, immTempRegister);
+                m_assembler.bgez(cmpTempRegister, 7);
+                m_assembler.nop();
+            } else {
+                loadDest(dataTempRegister);
+                if (imm.m_value >= 0 && imm.m_value  <= 32767) {
+                    move(dataTempRegister, cmpTempRegister);
+                    m_assembler.addiu(dataTempRegister, dataTempRegister, imm.m_value);
+                    m_assembler.bltz(cmpTempRegister, 9);
+                    storeDest(dataTempRegister);
+                    m_assembler.bgez(dataTempRegister, 7);
+                    m_assembler.nop();
+                } else if (imm.m_value >= -32768 && imm.m_value < 0) {
+                    move(dataTempRegister, cmpTempRegister);
+                    m_assembler.addiu(dataTempRegister, dataTempRegister, imm.m_value);
+                    m_assembler.bgez(cmpTempRegister, 9);
+                    storeDest(dataTempRegister);
+                    m_assembler.bltz(cmpTempRegister, 7);
+                    m_assembler.nop();
+                } else {
+                    move(imm, immTempRegister);
+                    m_assembler.xorInsn(cmpTempRegister, dataTempRegister, immTempRegister);
+                    m_assembler.addu(dataTempRegister, dataTempRegister, immTempRegister);
+                    m_assembler.bltz(cmpTempRegister, 10);
+                    storeDest(dataTempRegister);
+                    m_assembler.xorInsn(cmpTempRegister, dataTempRegister, immTempRegister);
+                    m_assembler.bgez(cmpTempRegister, 7);
+                    m_assembler.nop();
+                }
+            }
+            return jump();
+        }
+        move(imm, immTempRegister);
+        loadDest(dataTempRegister);
+        add32(immTempRegister, dataTempRegister);
+        storeDest(dataTempRegister);
+        if (cond == Signed) {
+            // Check if dest is negative.
+            m_assembler.slt(cmpTempRegister, dataTempRegister, MIPSRegisters::zero);
+            return branchNotEqual(cmpTempRegister, MIPSRegisters::zero);
+        }
+        if (cond == PositiveOrZero) {
+            // Check if dest is not negative.
+            m_assembler.slt(cmpTempRegister, dataTempRegister, MIPSRegisters::zero);
+            return branchEqual(cmpTempRegister, MIPSRegisters::zero);
+        }
+        if (cond == Zero)
+            return branchEqual(dataTempRegister, MIPSRegisters::zero);
+        if (cond == NonZero)
+            return branchNotEqual(dataTempRegister, MIPSRegisters::zero);
+        ASSERT(0);
+        return Jump();
+    }
+
     Jump branchMul32(ResultCondition cond, RegisterID src1, RegisterID src2, RegisterID dest)
     {
         ASSERT((cond == Overflow) || (cond == Signed) || (cond == Zero) || (cond == NonZero));

Modified: trunk/Source/_javascript_Core/bytecode/CodeBlock.h (278575 => 278576)


--- trunk/Source/_javascript_Core/bytecode/CodeBlock.h	2021-06-07 22:11:29 UTC (rev 278575)
+++ trunk/Source/_javascript_Core/bytecode/CodeBlock.h	2021-06-07 22:51:25 UTC (rev 278576)
@@ -169,7 +169,10 @@
     unsigned numTmps() const { return m_unlinkedCode->hasCheckpoints() * maxNumCheckpointTmps; }
 
     unsigned* addressOfNumParameters() { return &m_numParameters; }
+
+    static ptrdiff_t offsetOfNumCalleeLocals() { return OBJECT_OFFSETOF(CodeBlock, m_numCalleeLocals); }
     static ptrdiff_t offsetOfNumParameters() { return OBJECT_OFFSETOF(CodeBlock, m_numParameters); }
+    static ptrdiff_t offsetOfNumVars() { return OBJECT_OFFSETOF(CodeBlock, m_numVars); }
 
     CodeBlock* alternative() const { return static_cast<CodeBlock*>(m_alternative.get()); }
     void setAlternative(VM&, CodeBlock*);
@@ -486,6 +489,8 @@
         return result;
     }
 
+    static ptrdiff_t offsetOfArgumentValueProfiles() { return OBJECT_OFFSETOF(CodeBlock, m_argumentValueProfiles); }
+
     ValueProfile& valueProfileForBytecodeIndex(BytecodeIndex);
     SpeculatedType valueProfilePredictionForBytecodeIndex(const ConcurrentJSLocker&, BytecodeIndex);
 
@@ -819,7 +824,7 @@
     }
 
     bool wasCompiledWithDebuggingOpcodes() const { return m_unlinkedCode->wasCompiledWithDebuggingOpcodes(); }
-    
+
     // This is intentionally public; it's the responsibility of anyone doing any
     // of the following to hold the lock:
     //
@@ -906,6 +911,7 @@
 
     static ptrdiff_t offsetOfMetadataTable() { return OBJECT_OFFSETOF(CodeBlock, m_metadata); }
     static ptrdiff_t offsetOfInstructionsRawPointer() { return OBJECT_OFFSETOF(CodeBlock, m_instructionsRawPointer); }
+    static ptrdiff_t offsetOfShouldAlwaysBeInlined() { return OBJECT_OFFSETOF(CodeBlock, m_shouldAlwaysBeInlined); }
 
     bool loopHintsAreEligibleForFuzzingEarlyReturn()
     {

Modified: trunk/Source/_javascript_Core/jit/AssemblyHelpers.h (278575 => 278576)


--- trunk/Source/_javascript_Core/jit/AssemblyHelpers.h	2021-06-07 22:11:29 UTC (rev 278575)
+++ trunk/Source/_javascript_Core/jit/AssemblyHelpers.h	2021-06-07 22:51:25 UTC (rev 278576)
@@ -326,6 +326,11 @@
         ASSERT(codeBlock);
 
         const RegisterAtOffsetList* calleeSaves = codeBlock->calleeSaveRegisters();
+        emitSaveCalleeSavesFor(calleeSaves);
+    }
+
+    void emitSaveCalleeSavesFor(const RegisterAtOffsetList* calleeSaves)
+    {
         RegisterSet dontSaveRegisters = RegisterSet(RegisterSet::stackRegisters(), RegisterSet::allFPRs());
         unsigned registerCount = calleeSaves->size();
 
@@ -399,6 +404,11 @@
         emitSaveCalleeSavesFor(codeBlock());
     }
 
+    void emitSaveCalleeSavesForBaselineJIT()
+    {
+        emitSaveCalleeSavesFor(&RegisterAtOffsetList::llintBaselineCalleeSaveRegisters());
+    }
+
     void emitSaveThenMaterializeTagRegisters()
     {
 #if USE(JSVALUE64)
@@ -417,6 +427,11 @@
         emitRestoreCalleeSavesFor(codeBlock());
     }
 
+    void emitRestoreCalleeSavesForBaselineJIT()
+    {
+        emitRestoreCalleeSavesFor(&RegisterAtOffsetList::llintBaselineCalleeSaveRegisters());
+    }
+
     void emitRestoreSavedTagRegisters()
     {
 #if USE(JSVALUE64)

Modified: trunk/Source/_javascript_Core/jit/JIT.cpp (278575 => 278576)


--- trunk/Source/_javascript_Core/jit/JIT.cpp	2021-06-07 22:11:29 UTC (rev 278575)
+++ trunk/Source/_javascript_Core/jit/JIT.cpp	2021-06-07 22:51:25 UTC (rev 278576)
@@ -55,6 +55,20 @@
 static constexpr const bool verbose = false;
 }
 
+#if ENABLE(EXTRA_CTI_THUNKS)
+#if CPU(ARM64) || (CPU(X86_64) && !OS(WINDOWS))
+// These are supported ports.
+#else
+// This is a courtesy reminder (and warning) that the implementation of EXTRA_CTI_THUNKS can
+// use up to 6 argument registers and/or 6/7 temp registers, and make use of ARM64 like
+// features. Hence, it may not work for many other ports without significant work. If you
+// plan on adding EXTRA_CTI_THUNKS support for your port, please remember to search the
+// EXTRA_CTI_THUNKS code for CPU(ARM64) and CPU(X86_64) conditional code, and add support
+// for your port there as well.
+#error "unsupported architecture"
+#endif
+#endif // ENABLE(EXTRA_CTI_THUNKS)
+
 Seconds totalBaselineCompileTime;
 Seconds totalDFGCompileTime;
 Seconds totalFTLCompileTime;
@@ -83,7 +97,7 @@
 {
 }
 
-#if ENABLE(DFG_JIT)
+#if ENABLE(DFG_JIT) && !ENABLE(EXTRA_CTI_THUNKS)
 void JIT::emitEnterOptimizationCheck()
 {
     if (!canBeOptimized())
@@ -101,7 +115,7 @@
     farJump(returnValueGPR, GPRInfo::callFrameRegister);
     skipOptimize.link(this);
 }
-#endif
+#endif // ENABLE(DFG_JIT) && !ENABLE(EXTRA_CTI_THUNKS)(
 
 void JIT::emitNotifyWrite(WatchpointSet* set)
 {
@@ -682,6 +696,32 @@
 #endif
 }
 
+static inline unsigned prologueGeneratorSelector(bool doesProfiling, bool isConstructor, bool hasHugeFrame)
+{
+    return doesProfiling << 2 | isConstructor << 1 | hasHugeFrame << 0;
+}
+
+#define FOR_EACH_NON_PROFILING_PROLOGUE_GENERATOR(v) \
+    v(!doesProfiling, !isConstructor, !hasHugeFrame, prologueGenerator0, arityFixup_prologueGenerator0) \
+    v(!doesProfiling, !isConstructor,  hasHugeFrame, prologueGenerator1, arityFixup_prologueGenerator1) \
+    v(!doesProfiling,  isConstructor, !hasHugeFrame, prologueGenerator2, arityFixup_prologueGenerator2) \
+    v(!doesProfiling,  isConstructor,  hasHugeFrame, prologueGenerator3, arityFixup_prologueGenerator3)
+
+#if ENABLE(DFG_JIT)
+#define FOR_EACH_PROFILING_PROLOGUE_GENERATOR(v) \
+    v( doesProfiling, !isConstructor, !hasHugeFrame, prologueGenerator4, arityFixup_prologueGenerator4) \
+    v( doesProfiling, !isConstructor,  hasHugeFrame, prologueGenerator5, arityFixup_prologueGenerator5) \
+    v( doesProfiling,  isConstructor, !hasHugeFrame, prologueGenerator6, arityFixup_prologueGenerator6) \
+    v( doesProfiling,  isConstructor,  hasHugeFrame, prologueGenerator7, arityFixup_prologueGenerator7)
+
+#else // not ENABLE(DFG_JIT)
+#define FOR_EACH_PROFILING_PROLOGUE_GENERATOR(v)
+#endif // ENABLE(DFG_JIT)
+
+#define FOR_EACH_PROLOGUE_GENERATOR(v) \
+    FOR_EACH_NON_PROFILING_PROLOGUE_GENERATOR(v) \
+    FOR_EACH_PROFILING_PROLOGUE_GENERATOR(v)
+
 void JIT::compileAndLinkWithoutFinalizing(JITCompilationEffort effort)
 {
     DFG::CapabilityLevel level = m_codeBlock->capabilityLevel();
@@ -750,6 +790,8 @@
         nop();
 
     emitFunctionPrologue();
+
+#if !ENABLE(EXTRA_CTI_THUNKS)
     emitPutToCallFrameHeader(m_codeBlock, CallFrameSlot::codeBlock);
 
     Label beginLabel(this);
@@ -771,11 +813,10 @@
     if (m_codeBlock->codeType() == FunctionCode) {
         ASSERT(!m_bytecodeIndex);
         if (shouldEmitProfiling()) {
-            for (unsigned argument = 0; argument < m_codeBlock->numParameters(); ++argument) {
-                // If this is a constructor, then we want to put in a dummy profiling site (to
-                // keep things consistent) but we don't actually want to record the dummy value.
-                if (m_codeBlock->isConstructor() && !argument)
-                    continue;
+            // If this is a constructor, then we want to put in a dummy profiling site (to
+            // keep things consistent) but we don't actually want to record the dummy value.
+            unsigned startArgument = m_codeBlock->isConstructor() ? 1 : 0;
+            for (unsigned argument = startArgument; argument < m_codeBlock->numParameters(); ++argument) {
                 int offset = CallFrame::argumentOffsetIncludingThis(argument) * static_cast<int>(sizeof(Register));
 #if USE(JSVALUE64)
                 JSValueRegs resultRegs = JSValueRegs(regT0);
@@ -789,7 +830,34 @@
             }
         }
     }
-    
+#else // ENABLE(EXTRA_CTI_THUNKS)
+    constexpr GPRReg codeBlockGPR = regT7;
+    ASSERT(!m_bytecodeIndex);
+
+    int frameTopOffset = stackPointerOffsetFor(m_codeBlock) * sizeof(Register);
+    unsigned maxFrameSize = -frameTopOffset;
+
+    bool doesProfiling = (m_codeBlock->codeType() == FunctionCode) && shouldEmitProfiling();
+    bool isConstructor = m_codeBlock->isConstructor();
+    bool hasHugeFrame = maxFrameSize > Options::reservedZoneSize();
+
+    static constexpr ThunkGenerator generators[] = {
+#define USE_PROLOGUE_GENERATOR(doesProfiling, isConstructor, hasHugeFrame, name, arityFixupName) name,
+        FOR_EACH_PROLOGUE_GENERATOR(USE_PROLOGUE_GENERATOR)
+#undef USE_PROLOGUE_GENERATOR
+    };
+    static constexpr unsigned numberOfGenerators = sizeof(generators) / sizeof(generators[0]);
+
+    move(TrustedImmPtr(m_codeBlock), codeBlockGPR);
+
+    unsigned generatorSelector = prologueGeneratorSelector(doesProfiling, isConstructor, hasHugeFrame);
+    RELEASE_ASSERT(generatorSelector < numberOfGenerators);
+    auto generator = generators[generatorSelector];
+    emitNakedNearCall(vm().getCTIStub(generator).retaggedCode<NoPtrTag>());
+
+    Label bodyLabel(this);
+#endif // !ENABLE(EXTRA_CTI_THUNKS)
+
     RELEASE_ASSERT(!JITCode::isJIT(m_codeBlock->jitType()));
 
     if (UNLIKELY(sizeMarker))
@@ -803,16 +871,19 @@
         m_disassembler->setEndOfSlowPath(label());
     m_pcToCodeOriginMapBuilder.appendItem(label(), PCToCodeOriginMapBuilder::defaultCodeOrigin());
 
+#if !ENABLE(EXTRA_CTI_THUNKS)
     stackOverflow.link(this);
     m_bytecodeIndex = BytecodeIndex(0);
     if (maxFrameExtentForSlowPathCall)
         addPtr(TrustedImm32(-static_cast<int32_t>(maxFrameExtentForSlowPathCall)), stackPointerRegister);
     callOperationWithCallFrameRollbackOnException(operationThrowStackOverflowError, m_codeBlock);
+#endif
 
     // If the number of parameters is 1, we never require arity fixup.
     bool requiresArityFixup = m_codeBlock->m_numParameters != 1;
     if (m_codeBlock->codeType() == FunctionCode && requiresArityFixup) {
         m_arityCheck = label();
+#if !ENABLE(EXTRA_CTI_THUNKS)
         store8(TrustedImm32(0), &m_codeBlock->m_shouldAlwaysBeInlined);
         emitFunctionPrologue();
         emitPutToCallFrameHeader(m_codeBlock, CallFrameSlot::codeBlock);
@@ -831,17 +902,42 @@
         move(returnValueGPR, GPRInfo::argumentGPR0);
         emitNakedNearCall(m_vm->getCTIStub(arityFixupGenerator).retaggedCode<NoPtrTag>());
 
+        jump(beginLabel);
+
+#else // ENABLE(EXTRA_CTI_THUNKS)
+        emitFunctionPrologue();
+
+        static_assert(codeBlockGPR == regT7);
+        ASSERT(!m_bytecodeIndex);
+
+        static constexpr ThunkGenerator generators[] = {
+#define USE_PROLOGUE_GENERATOR(doesProfiling, isConstructor, hasHugeFrame, name, arityFixupName) arityFixupName,
+            FOR_EACH_PROLOGUE_GENERATOR(USE_PROLOGUE_GENERATOR)
+#undef USE_PROLOGUE_GENERATOR
+        };
+        static constexpr unsigned numberOfGenerators = sizeof(generators) / sizeof(generators[0]);
+
+        move(TrustedImmPtr(m_codeBlock), codeBlockGPR);
+
+        RELEASE_ASSERT(generatorSelector < numberOfGenerators);
+        auto generator = generators[generatorSelector];
+        RELEASE_ASSERT(generator);
+        emitNakedNearCall(vm().getCTIStub(generator).retaggedCode<NoPtrTag>());
+
+        jump(bodyLabel);
+#endif // !ENABLE(EXTRA_CTI_THUNKS)
+
 #if ASSERT_ENABLED
         m_bytecodeIndex = BytecodeIndex(); // Reset this, in order to guard its use with ASSERTs.
 #endif
-
-        jump(beginLabel);
     } else
         m_arityCheck = entryLabel; // Never require arity fixup.
 
     ASSERT(m_jmpTable.isEmpty());
     
+#if !ENABLE(EXTRA_CTI_THUNKS)
     privateCompileExceptionHandlers();
+#endif
     
     if (m_disassembler)
         m_disassembler->setEndOfCode(label());
@@ -851,6 +947,241 @@
     link();
 }
 
+#if ENABLE(EXTRA_CTI_THUNKS)
+MacroAssemblerCodeRef<JITThunkPtrTag> JIT::prologueGenerator(VM& vm, bool doesProfiling, bool isConstructor, bool hasHugeFrame, const char* thunkName)
+{
+    // This function generates the Baseline JIT's prologue code. It is not useable by other tiers.
+    constexpr GPRReg codeBlockGPR = regT7; // incoming.
+
+    constexpr int virtualRegisterSize = static_cast<int>(sizeof(Register));
+    constexpr int virtualRegisterSizeShift = 3;
+    static_assert((1 << virtualRegisterSizeShift) == virtualRegisterSize);
+
+    tagReturnAddress();
+
+    storePtr(codeBlockGPR, addressFor(CallFrameSlot::codeBlock));
+
+    load32(Address(codeBlockGPR, CodeBlock::offsetOfNumCalleeLocals()), regT1);
+    if constexpr (maxFrameExtentForSlowPathCallInRegisters)
+        add32(TrustedImm32(maxFrameExtentForSlowPathCallInRegisters), regT1);
+    lshift32(TrustedImm32(virtualRegisterSizeShift), regT1);
+    neg64(regT1);
+#if ASSERT_ENABLED
+    Probe::Function probeFunction = [] (Probe::Context& context) {
+        CodeBlock* codeBlock = context.fp<CallFrame*>()->codeBlock();
+        int64_t frameTopOffset = stackPointerOffsetFor(codeBlock) * sizeof(Register);
+        RELEASE_ASSERT(context.gpr<intptr_t>(regT1) == frameTopOffset);
+    };
+    probe(tagCFunctionPtr<JITProbePtrTag>(probeFunction), nullptr);
+#endif
+
+    addPtr(callFrameRegister, regT1);
+
+    JumpList stackOverflow;
+    if (hasHugeFrame)
+        stackOverflow.append(branchPtr(Above, regT1, callFrameRegister));
+    stackOverflow.append(branchPtr(Above, AbsoluteAddress(vm.addressOfSoftStackLimit()), regT1));
+
+    // We'll be imminently returning with a `retab` (ARM64E's return with authentication
+    // using the B key) in the normal path (see MacroAssemblerARM64E's implementation of
+    // ret()), which will do validation. So, extra validation here is redundant and unnecessary.
+    untagReturnAddressWithoutExtraValidation();
+#if CPU(X86_64)
+    pop(regT2); // Save the return address.
+#endif
+    move(regT1, stackPointerRegister);
+    tagReturnAddress();
+    checkStackPointerAlignment();
+#if CPU(X86_64)
+    push(regT2); // Restore the return address.
+#endif
+
+    emitSaveCalleeSavesForBaselineJIT();
+    emitMaterializeTagCheckRegisters();
+
+    if (doesProfiling) {
+        constexpr GPRReg argumentValueProfileGPR = regT6;
+        constexpr GPRReg numParametersGPR = regT5;
+        constexpr GPRReg argumentGPR = regT4;
+
+        load32(Address(codeBlockGPR, CodeBlock::offsetOfNumParameters()), numParametersGPR);
+        loadPtr(Address(codeBlockGPR, CodeBlock::offsetOfArgumentValueProfiles()), argumentValueProfileGPR);
+        if (isConstructor)
+            addPtr(TrustedImm32(sizeof(ValueProfile)), argumentValueProfileGPR);
+
+        int startArgument = CallFrameSlot::thisArgument + (isConstructor ? 1 : 0);
+        int startArgumentOffset = startArgument * virtualRegisterSize;
+        move(TrustedImm64(startArgumentOffset), argumentGPR);
+
+        add32(TrustedImm32(static_cast<int>(CallFrameSlot::thisArgument)), numParametersGPR);
+        lshift32(TrustedImm32(virtualRegisterSizeShift), numParametersGPR);
+
+        addPtr(callFrameRegister, argumentGPR);
+        addPtr(callFrameRegister, numParametersGPR);
+
+        Label loopStart(this);
+        Jump done = branchPtr(AboveOrEqual, argumentGPR, numParametersGPR);
+        {
+            load64(Address(argumentGPR), regT0);
+            store64(regT0, Address(argumentValueProfileGPR, OBJECT_OFFSETOF(ValueProfile, m_buckets)));
+
+            // The argument ValueProfiles are stored in a FixedVector. Hence, the
+            // address of the next profile can be trivially computed with an increment.
+            addPtr(TrustedImm32(sizeof(ValueProfile)), argumentValueProfileGPR);
+            addPtr(TrustedImm32(virtualRegisterSize), argumentGPR);
+            jump().linkTo(loopStart, this);
+        }
+        done.link(this);
+    }
+    ret();
+
+    stackOverflow.link(this);
+#if CPU(X86_64)
+    addPtr(TrustedImm32(1 * sizeof(CPURegister)), stackPointerRegister); // discard return address.
+#endif
+
+    uint32_t locationBits = CallSiteIndex(0).bits();
+    store32(TrustedImm32(locationBits), tagFor(CallFrameSlot::argumentCountIncludingThis));
+
+    if (maxFrameExtentForSlowPathCall)
+        addPtr(TrustedImm32(-static_cast<int32_t>(maxFrameExtentForSlowPathCall)), stackPointerRegister);
+
+    setupArguments<decltype(operationThrowStackOverflowError)>(codeBlockGPR);
+    prepareCallOperation(vm);
+    MacroAssembler::Call operationCall = call(OperationPtrTag);
+    Jump handleExceptionJump = jump();
+
+    auto handler = vm.getCTIStub(handleExceptionWithCallFrameRollbackGenerator);
+
+    LinkBuffer patchBuffer(*this, GLOBAL_THUNK_ID, LinkBuffer::Profile::ExtraCTIThunk);
+    patchBuffer.link(operationCall, FunctionPtr<OperationPtrTag>(operationThrowStackOverflowError));
+    patchBuffer.link(handleExceptionJump, CodeLocationLabel(handler.retaggedCode<NoPtrTag>()));
+    return FINALIZE_CODE(patchBuffer, JITThunkPtrTag, thunkName);
+}
+
+static constexpr bool doesProfiling = true;
+static constexpr bool isConstructor = true;
+static constexpr bool hasHugeFrame = true;
+
+#define DEFINE_PROGLOGUE_GENERATOR(doesProfiling, isConstructor, hasHugeFrame, name, arityFixupName) \
+    MacroAssemblerCodeRef<JITThunkPtrTag> JIT::name(VM& vm) \
+    { \
+        JIT jit(vm); \
+        return jit.prologueGenerator(vm, doesProfiling, isConstructor, hasHugeFrame, "Baseline: " #name); \
+    }
+
+FOR_EACH_PROLOGUE_GENERATOR(DEFINE_PROGLOGUE_GENERATOR)
+#undef DEFINE_PROGLOGUE_GENERATOR
+
+MacroAssemblerCodeRef<JITThunkPtrTag> JIT::arityFixupPrologueGenerator(VM& vm, bool isConstructor, ThunkGenerator normalPrologueGenerator, const char* thunkName)
+{
+    // This function generates the Baseline JIT's prologue code. It is not useable by other tiers.
+    constexpr GPRReg codeBlockGPR = regT7; // incoming.
+    constexpr GPRReg numParametersGPR = regT6;
+
+    tagReturnAddress();
+#if CPU(X86_64)
+    push(framePointerRegister);
+#elif CPU(ARM64)
+    pushPair(framePointerRegister, linkRegister);
+#endif
+
+    storePtr(codeBlockGPR, addressFor(CallFrameSlot::codeBlock));
+    store8(TrustedImm32(0), Address(codeBlockGPR, CodeBlock::offsetOfShouldAlwaysBeInlined()));
+
+    load32(payloadFor(CallFrameSlot::argumentCountIncludingThis), regT1);
+    load32(Address(codeBlockGPR, CodeBlock::offsetOfNumParameters()), numParametersGPR);
+    Jump noFixupNeeded = branch32(AboveOrEqual, regT1, numParametersGPR);
+
+    if constexpr (maxFrameExtentForSlowPathCall)
+        addPtr(TrustedImm32(-static_cast<int32_t>(maxFrameExtentForSlowPathCall)), stackPointerRegister);
+
+    loadPtr(Address(codeBlockGPR, CodeBlock::offsetOfGlobalObject()), argumentGPR0);
+
+    static_assert(std::is_same<decltype(operationConstructArityCheck), decltype(operationCallArityCheck)>::value);
+    setupArguments<decltype(operationCallArityCheck)>(argumentGPR0);
+    prepareCallOperation(vm);
+
+    MacroAssembler::Call arityCheckCall = call(OperationPtrTag);
+    Jump handleExceptionJump = emitNonPatchableExceptionCheck(vm);
+
+    if constexpr (maxFrameExtentForSlowPathCall)
+        addPtr(TrustedImm32(maxFrameExtentForSlowPathCall), stackPointerRegister);
+    Jump needFixup = branchTest32(NonZero, returnValueGPR);
+    noFixupNeeded.link(this);
+
+    // The normal prologue expects incoming codeBlockGPR.
+    load64(addressFor(CallFrameSlot::codeBlock), codeBlockGPR);
+
+#if CPU(X86_64)
+    pop(framePointerRegister);
+#elif CPU(ARM64)
+    popPair(framePointerRegister, linkRegister);
+#endif
+    untagReturnAddress();
+
+    JumpList normalPrologueJump;
+    normalPrologueJump.append(jump());
+
+    needFixup.link(this);
+
+    // Restore the stack for arity fixup, and preserve the return address.
+    // arityFixupGenerator will be shifting the stack. So, we can't use the stack to
+    // preserve the return address. We also can't use callee saved registers because
+    // they haven't been saved yet.
+    //
+    // arityFixupGenerator is carefully crafted to only use a0, a1, a2, t3, t4 and t5.
+    // So, the return address can be preserved in regT7.
+#if CPU(X86_64)
+    pop(argumentGPR2); // discard.
+    pop(regT7); // save return address.
+#elif CPU(ARM64)
+    popPair(framePointerRegister, linkRegister);
+    untagReturnAddress();
+    move(linkRegister, regT7);
+    auto randomReturnAddressTag = random();
+    move(TrustedImm32(randomReturnAddressTag), regT1);
+    tagPtr(regT1, regT7);
+#endif
+    move(returnValueGPR, GPRInfo::argumentGPR0);
+    Call arityFixupCall = nearCall();
+
+#if CPU(X86_64)
+    push(regT7); // restore return address.
+#elif CPU(ARM64)
+    move(TrustedImm32(randomReturnAddressTag), regT1);
+    untagPtr(regT1, regT7);
+    move(regT7, linkRegister);
+#endif
+
+    load64(addressFor(CallFrameSlot::codeBlock), codeBlockGPR);
+    normalPrologueJump.append(jump());
+
+    auto arityCheckOperation = isConstructor ? operationConstructArityCheck : operationCallArityCheck;
+    auto arityFixup = vm.getCTIStub(arityFixupGenerator);
+    auto normalPrologue = vm.getCTIStub(normalPrologueGenerator);
+    auto exceptionHandler = vm.getCTIStub(popThunkStackPreservesAndHandleExceptionGenerator);
+
+    LinkBuffer patchBuffer(*this, GLOBAL_THUNK_ID, LinkBuffer::Profile::ExtraCTIThunk);
+    patchBuffer.link(arityCheckCall, FunctionPtr<OperationPtrTag>(arityCheckOperation));
+    patchBuffer.link(arityFixupCall, FunctionPtr(arityFixup.retaggedCode<NoPtrTag>()));
+    patchBuffer.link(normalPrologueJump, CodeLocationLabel(normalPrologue.retaggedCode<NoPtrTag>()));
+    patchBuffer.link(handleExceptionJump, CodeLocationLabel(exceptionHandler.retaggedCode<NoPtrTag>()));
+    return FINALIZE_CODE(patchBuffer, JITThunkPtrTag, thunkName);
+}
+
+#define DEFINE_ARITY_PROGLOGUE_GENERATOR(doesProfiling, isConstructor, hasHugeFrame, name, arityFixupName) \
+MacroAssemblerCodeRef<JITThunkPtrTag> JIT::arityFixupName(VM& vm) \
+    { \
+        JIT jit(vm); \
+        return jit.arityFixupPrologueGenerator(vm, isConstructor, name, "Baseline: " #arityFixupName); \
+    }
+
+FOR_EACH_PROLOGUE_GENERATOR(DEFINE_ARITY_PROGLOGUE_GENERATOR)
+#undef DEFINE_ARITY_PROGLOGUE_GENERATOR
+
+#endif // ENABLE(EXTRA_CTI_THUNKS)
+
 void JIT::link()
 {
     LinkBuffer& patchBuffer = *m_linkBuffer;
@@ -1046,9 +1377,9 @@
     return finalizeOnMainThread();
 }
 
+#if !ENABLE(EXTRA_CTI_THUNKS)
 void JIT::privateCompileExceptionHandlers()
 {
-#if !ENABLE(EXTRA_CTI_THUNKS)
     if (!m_exceptionChecksWithCallFrameRollback.empty()) {
         m_exceptionChecksWithCallFrameRollback.link(this);
 
@@ -1073,8 +1404,8 @@
         m_farCalls.append(FarCallRecord(call(OperationPtrTag), FunctionPtr<OperationPtrTag>(operationLookupExceptionHandler)));
         jumpToExceptionHandler(vm());
     }
-#endif // ENABLE(EXTRA_CTI_THUNKS)
 }
+#endif // !ENABLE(EXTRA_CTI_THUNKS)
 
 void JIT::doMainThreadPreparationBeforeCompile()
 {

Modified: trunk/Source/_javascript_Core/jit/JIT.h (278575 => 278576)


--- trunk/Source/_javascript_Core/jit/JIT.h	2021-06-07 22:11:29 UTC (rev 278575)
+++ trunk/Source/_javascript_Core/jit/JIT.h	2021-06-07 22:51:25 UTC (rev 278576)
@@ -318,7 +318,9 @@
             m_exceptionChecksWithCallFrameRollback.append(emitExceptionCheck(vm()));
         }
 
+#if !ENABLE(EXTRA_CTI_THUNKS)
         void privateCompileExceptionHandlers();
+#endif
 
         void advanceToNextCheckpoint();
         void emitJumpSlowToHotForCheckpoint(Jump);
@@ -790,6 +792,26 @@
 
 #if ENABLE(EXTRA_CTI_THUNKS)
         // Thunk generators.
+        static MacroAssemblerCodeRef<JITThunkPtrTag> prologueGenerator0(VM&);
+        static MacroAssemblerCodeRef<JITThunkPtrTag> prologueGenerator1(VM&);
+        static MacroAssemblerCodeRef<JITThunkPtrTag> prologueGenerator2(VM&);
+        static MacroAssemblerCodeRef<JITThunkPtrTag> prologueGenerator3(VM&);
+        static MacroAssemblerCodeRef<JITThunkPtrTag> prologueGenerator4(VM&);
+        static MacroAssemblerCodeRef<JITThunkPtrTag> prologueGenerator5(VM&);
+        static MacroAssemblerCodeRef<JITThunkPtrTag> prologueGenerator6(VM&);
+        static MacroAssemblerCodeRef<JITThunkPtrTag> prologueGenerator7(VM&);
+        MacroAssemblerCodeRef<JITThunkPtrTag> prologueGenerator(VM&, bool doesProfiling, bool isConstructor, bool hasHugeFrame, const char* name);
+
+        static MacroAssemblerCodeRef<JITThunkPtrTag> arityFixup_prologueGenerator0(VM&);
+        static MacroAssemblerCodeRef<JITThunkPtrTag> arityFixup_prologueGenerator1(VM&);
+        static MacroAssemblerCodeRef<JITThunkPtrTag> arityFixup_prologueGenerator2(VM&);
+        static MacroAssemblerCodeRef<JITThunkPtrTag> arityFixup_prologueGenerator3(VM&);
+        static MacroAssemblerCodeRef<JITThunkPtrTag> arityFixup_prologueGenerator4(VM&);
+        static MacroAssemblerCodeRef<JITThunkPtrTag> arityFixup_prologueGenerator5(VM&);
+        static MacroAssemblerCodeRef<JITThunkPtrTag> arityFixup_prologueGenerator6(VM&);
+        static MacroAssemblerCodeRef<JITThunkPtrTag> arityFixup_prologueGenerator7(VM&);
+        MacroAssemblerCodeRef<JITThunkPtrTag> arityFixupPrologueGenerator(VM&, bool isConstructor, ThunkGenerator normalPrologueGenerator, const char* name);
+
         static MacroAssemblerCodeRef<JITThunkPtrTag> slow_op_del_by_id_prepareCallGenerator(VM&);
         static MacroAssemblerCodeRef<JITThunkPtrTag> slow_op_del_by_val_prepareCallGenerator(VM&);
         static MacroAssemblerCodeRef<JITThunkPtrTag> slow_op_get_by_id_prepareCallGenerator(VM&);
@@ -804,7 +826,14 @@
         static MacroAssemblerCodeRef<JITThunkPtrTag> slow_op_resolve_scopeGenerator(VM&);
 
         static MacroAssemblerCodeRef<JITThunkPtrTag> op_check_traps_handlerGenerator(VM&);
-        static MacroAssemblerCodeRef<JITThunkPtrTag> op_enter_handlerGenerator(VM&);
+
+        static MacroAssemblerCodeRef<JITThunkPtrTag> op_enter_canBeOptimized_Generator(VM&);
+        static MacroAssemblerCodeRef<JITThunkPtrTag> op_enter_cannotBeOptimized_Generator(VM&);
+        MacroAssemblerCodeRef<JITThunkPtrTag> op_enter_Generator(VM&, bool canBeOptimized, const char* thunkName);
+
+#if ENABLE(DFG_JIT)
+        static MacroAssemblerCodeRef<JITThunkPtrTag> op_loop_hint_Generator(VM&);
+#endif
         static MacroAssemblerCodeRef<JITThunkPtrTag> op_ret_handlerGenerator(VM&);
         static MacroAssemblerCodeRef<JITThunkPtrTag> op_throw_handlerGenerator(VM&);
 

Modified: trunk/Source/_javascript_Core/jit/JITInlines.h (278575 => 278576)


--- trunk/Source/_javascript_Core/jit/JITInlines.h	2021-06-07 22:11:29 UTC (rev 278575)
+++ trunk/Source/_javascript_Core/jit/JITInlines.h	2021-06-07 22:51:25 UTC (rev 278576)
@@ -91,7 +91,6 @@
 
 ALWAYS_INLINE JIT::Call JIT::emitNakedNearCall(CodePtr<NoPtrTag> target)
 {
-    ASSERT(m_bytecodeIndex); // This method should only be called during hot/cold path generation, so that m_bytecodeIndex is set.
     Call nakedCall = nearCall();
     m_nearCalls.append(NearCallRecord(nakedCall, FunctionPtr<JSInternalPtrTag>(target.retagged<JSInternalPtrTag>())));
     return nakedCall;

Modified: trunk/Source/_javascript_Core/jit/JITOpcodes.cpp (278575 => 278576)


--- trunk/Source/_javascript_Core/jit/JITOpcodes.cpp	2021-06-07 22:11:29 UTC (rev 278575)
+++ trunk/Source/_javascript_Core/jit/JITOpcodes.cpp	2021-06-07 22:51:25 UTC (rev 278576)
@@ -376,7 +376,7 @@
     JIT jit(vm);
 
     jit.checkStackPointerAlignment();
-    jit.emitRestoreCalleeSavesFor(&RegisterAtOffsetList::llintBaselineCalleeSaveRegisters());
+    jit.emitRestoreCalleeSavesForBaselineJIT();
     jit.emitFunctionEpilogue();
     jit.ret();
 
@@ -1186,104 +1186,116 @@
     emitEnterOptimizationCheck();
 #else
     ASSERT(m_bytecodeIndex.offset() == 0);
-    constexpr GPRReg localsToInitGPR = argumentGPR0;
-    constexpr GPRReg canBeOptimizedGPR = argumentGPR4;
-
     unsigned localsToInit = count - CodeBlock::llintBaselineCalleeSaveSpaceAsVirtualRegisters();
     RELEASE_ASSERT(localsToInit < count);
-    move(TrustedImm32(localsToInit * sizeof(Register)), localsToInitGPR);
-    move(TrustedImm32(canBeOptimized()), canBeOptimizedGPR);
-    emitNakedNearCall(vm().getCTIStub(op_enter_handlerGenerator).retaggedCode<NoPtrTag>());
+    ThunkGenerator generator = canBeOptimized() ? op_enter_canBeOptimized_Generator : op_enter_cannotBeOptimized_Generator;
+    emitNakedNearCall(vm().getCTIStub(generator).retaggedCode<NoPtrTag>());
 #endif // ENABLE(EXTRA_CTI_THUNKS)
 }
 
 #if ENABLE(EXTRA_CTI_THUNKS)
-MacroAssemblerCodeRef<JITThunkPtrTag> JIT::op_enter_handlerGenerator(VM& vm)
+MacroAssemblerCodeRef<JITThunkPtrTag> JIT::op_enter_Generator(VM& vm, bool canBeOptimized, const char* thunkName)
 {
-    JIT jit(vm);
-
 #if CPU(X86_64)
-    jit.push(X86Registers::ebp);
+    push(X86Registers::ebp);
 #elif CPU(ARM64)
-    jit.tagReturnAddress();
-    jit.pushPair(framePointerRegister, linkRegister);
+    tagReturnAddress();
+    pushPair(framePointerRegister, linkRegister);
 #endif
     // op_enter is always at bytecodeOffset 0.
-    jit.store32(TrustedImm32(0), tagFor(CallFrameSlot::argumentCountIncludingThis));
+    store32(TrustedImm32(0), tagFor(CallFrameSlot::argumentCountIncludingThis));
 
     constexpr GPRReg localsToInitGPR = argumentGPR0;
     constexpr GPRReg iteratorGPR = argumentGPR1;
     constexpr GPRReg endGPR = argumentGPR2;
     constexpr GPRReg undefinedGPR = argumentGPR3;
-    constexpr GPRReg canBeOptimizedGPR = argumentGPR4;
+    constexpr GPRReg codeBlockGPR = argumentGPR4;
 
+    constexpr int virtualRegisterSizeShift = 3;
+    static_assert((1 << virtualRegisterSizeShift) == sizeof(Register));
+
+    loadPtr(addressFor(CallFrameSlot::codeBlock), codeBlockGPR);
+    load32(Address(codeBlockGPR, CodeBlock::offsetOfNumVars()), localsToInitGPR);
+    sub32(TrustedImm32(CodeBlock::llintBaselineCalleeSaveSpaceAsVirtualRegisters()), localsToInitGPR);
+    lshift32(TrustedImm32(virtualRegisterSizeShift), localsToInitGPR);
+
     size_t startLocal = CodeBlock::llintBaselineCalleeSaveSpaceAsVirtualRegisters();
     int startOffset = virtualRegisterForLocal(startLocal).offset();
-    jit.move(TrustedImm64(startOffset * sizeof(Register)), iteratorGPR);
-    jit.sub64(iteratorGPR, localsToInitGPR, endGPR);
+    move(TrustedImm64(startOffset * sizeof(Register)), iteratorGPR);
+    sub64(iteratorGPR, localsToInitGPR, endGPR);
 
-    jit.move(TrustedImm64(JSValue::encode(jsUndefined())), undefinedGPR);
-    auto initLoop = jit.label();
-    Jump initDone = jit.branch32(LessThanOrEqual, iteratorGPR, endGPR);
+    move(TrustedImm64(JSValue::encode(jsUndefined())), undefinedGPR);
+    auto initLoop = label();
+    Jump initDone = branch32(LessThanOrEqual, iteratorGPR, endGPR);
     {
-        jit.store64(undefinedGPR, BaseIndex(GPRInfo::callFrameRegister, iteratorGPR, TimesOne));
-        jit.sub64(TrustedImm32(sizeof(Register)), iteratorGPR);
-        jit.jump(initLoop);
+        store64(undefinedGPR, BaseIndex(GPRInfo::callFrameRegister, iteratorGPR, TimesOne));
+        sub64(TrustedImm32(sizeof(Register)), iteratorGPR);
+        jump(initLoop);
     }
-    initDone.link(&jit);
+    initDone.link(this);
 
-    // emitWriteBarrier(m_codeBlock).
-    jit.loadPtr(addressFor(CallFrameSlot::codeBlock), argumentGPR1);
-    Jump ownerIsRememberedOrInEden = jit.barrierBranch(vm, argumentGPR1, argumentGPR2);
+    // Implementing emitWriteBarrier(m_codeBlock).
+    Jump ownerIsRememberedOrInEden = barrierBranch(vm, codeBlockGPR, argumentGPR2);
 
-    jit.move(canBeOptimizedGPR, GPRInfo::numberTagRegister); // save.
-    jit.setupArguments<decltype(operationWriteBarrierSlowPath)>(&vm, argumentGPR1);
-    jit.prepareCallOperation(vm);
-    Call operationWriteBarrierCall = jit.call(OperationPtrTag);
+    setupArguments<decltype(operationWriteBarrierSlowPath)>(&vm, codeBlockGPR);
+    prepareCallOperation(vm);
+    Call operationWriteBarrierCall = call(OperationPtrTag);
 
-    jit.move(GPRInfo::numberTagRegister, canBeOptimizedGPR); // restore.
-    jit.move(TrustedImm64(JSValue::NumberTag), GPRInfo::numberTagRegister);
-    ownerIsRememberedOrInEden.link(&jit);
+    if (canBeOptimized)
+        loadPtr(addressFor(CallFrameSlot::codeBlock), codeBlockGPR);
 
+    ownerIsRememberedOrInEden.link(this);
+
 #if ENABLE(DFG_JIT)
+    // Implementing emitEnterOptimizationCheck().
     Call operationOptimizeCall;
-    if (Options::useDFGJIT()) {
-        // emitEnterOptimizationCheck().
+    if (canBeOptimized) {
         JumpList skipOptimize;
 
-        skipOptimize.append(jit.branchTest32(Zero, canBeOptimizedGPR));
+        skipOptimize.append(branchAdd32(Signed, TrustedImm32(Options::executionCounterIncrementForEntry()), Address(codeBlockGPR, CodeBlock::offsetOfJITExecuteCounter())));
 
-        jit.loadPtr(addressFor(CallFrameSlot::codeBlock), argumentGPR1);
-        skipOptimize.append(jit.branchAdd32(Signed, TrustedImm32(Options::executionCounterIncrementForEntry()), Address(argumentGPR1, CodeBlock::offsetOfJITExecuteCounter())));
+        copyLLIntBaselineCalleeSavesFromFrameOrRegisterToEntryFrameCalleeSavesBuffer(vm.topEntryFrame);
 
-        jit.copyLLIntBaselineCalleeSavesFromFrameOrRegisterToEntryFrameCalleeSavesBuffer(vm.topEntryFrame);
+        setupArguments<decltype(operationOptimize)>(&vm, TrustedImm32(0));
+        prepareCallOperation(vm);
+        operationOptimizeCall = call(OperationPtrTag);
 
-        jit.setupArguments<decltype(operationOptimize)>(&vm, TrustedImm32(0));
-        jit.prepareCallOperation(vm);
-        operationOptimizeCall = jit.call(OperationPtrTag);
+        skipOptimize.append(branchTestPtr(Zero, returnValueGPR));
+        farJump(returnValueGPR, GPRInfo::callFrameRegister);
 
-        skipOptimize.append(jit.branchTestPtr(Zero, returnValueGPR));
-        jit.farJump(returnValueGPR, GPRInfo::callFrameRegister);
-
-        skipOptimize.link(&jit);
+        skipOptimize.link(this);
     }
 #endif // ENABLE(DFG_JIT)
 
 #if CPU(X86_64)
-    jit.pop(X86Registers::ebp);
+    pop(X86Registers::ebp);
 #elif CPU(ARM64)
-    jit.popPair(framePointerRegister, linkRegister);
+    popPair(framePointerRegister, linkRegister);
 #endif
-    jit.ret();
+    ret();
 
-    LinkBuffer patchBuffer(jit, GLOBAL_THUNK_ID, LinkBuffer::Profile::ExtraCTIThunk);
+    LinkBuffer patchBuffer(*this, GLOBAL_THUNK_ID, LinkBuffer::Profile::ExtraCTIThunk);
     patchBuffer.link(operationWriteBarrierCall, FunctionPtr<OperationPtrTag>(operationWriteBarrierSlowPath));
 #if ENABLE(DFG_JIT)
-    if (Options::useDFGJIT())
+    if (canBeOptimized)
         patchBuffer.link(operationOptimizeCall, FunctionPtr<OperationPtrTag>(operationOptimize));
 #endif
-    return FINALIZE_CODE(patchBuffer, JITThunkPtrTag, "Baseline: op_enter_handler");
+    return FINALIZE_CODE(patchBuffer, JITThunkPtrTag, thunkName);
 }
+
+MacroAssemblerCodeRef<JITThunkPtrTag> JIT::op_enter_canBeOptimized_Generator(VM& vm)
+{
+    JIT jit(vm);
+    constexpr bool canBeOptimized = true;
+    return jit.op_enter_Generator(vm, canBeOptimized, "Baseline: op_enter_canBeOptimized");
+}
+
+MacroAssemblerCodeRef<JITThunkPtrTag> JIT::op_enter_cannotBeOptimized_Generator(VM& vm)
+{
+    JIT jit(vm);
+    constexpr bool canBeOptimized = false;
+    return jit.op_enter_Generator(vm, canBeOptimized, "Baseline: op_enter_cannotBeOptimized");
+}
 #endif // ENABLE(EXTRA_CTI_THUNKS)
 
 void JIT::emit_op_get_scope(const Instruction* currentInstruction)
@@ -1434,16 +1446,20 @@
         add64(TrustedImm32(1), regT0);
         store64(regT0, ptr);
     }
+#else
+    UNUSED_PARAM(instruction);
 #endif
 
-    // Emit the JIT optimization check: 
+    // Emit the JIT optimization check:
     if (canBeOptimized()) {
+        constexpr GPRReg codeBlockGPR = regT0;
+        loadPtr(addressFor(CallFrameSlot::codeBlock), codeBlockGPR);
         addSlowCase(branchAdd32(PositiveOrZero, TrustedImm32(Options::executionCounterIncrementForLoop()),
-            AbsoluteAddress(m_codeBlock->addressOfJITExecuteCounter())));
+            Address(codeBlockGPR, CodeBlock::offsetOfJITExecuteCounter())));
     }
 }
 
-void JIT::emitSlow_op_loop_hint(const Instruction* currentInstruction, Vector<SlowCaseEntry>::iterator& iter)
+void JIT::emitSlow_op_loop_hint(const Instruction* instruction, Vector<SlowCaseEntry>::iterator& iter)
 {
 #if ENABLE(DFG_JIT)
     // Emit the slow path for the JIT optimization check:
@@ -1450,6 +1466,7 @@
     if (canBeOptimized()) {
         linkAllSlowCases(iter);
 
+#if !ENABLE(EXTRA_CTI_THUNKS)
         copyLLIntBaselineCalleeSavesFromFrameOrRegisterToEntryFrameCalleeSavesBuffer(vm().topEntryFrame);
 
         callOperationNoExceptionCheck(operationOptimize, &vm(), m_bytecodeIndex.asBits());
@@ -1462,13 +1479,81 @@
         farJump(returnValueGPR, GPRInfo::callFrameRegister);
         noOptimizedEntry.link(this);
 
-        emitJumpSlowToHot(jump(), currentInstruction->size());
+#else // ENABLE(EXTRA_CTI_THUNKS)
+        uint32_t bytecodeOffset = m_bytecodeIndex.offset();
+        ASSERT(BytecodeIndex(bytecodeOffset) == m_bytecodeIndex);
+        ASSERT(m_codeBlock->instructionAt(m_bytecodeIndex) == instruction);
+
+        constexpr GPRReg bytecodeOffsetGPR = regT7;
+
+        move(TrustedImm32(bytecodeOffset), bytecodeOffsetGPR);
+        emitNakedNearCall(vm().getCTIStub(op_loop_hint_Generator).retaggedCode<NoPtrTag>());
+#endif // !ENABLE(EXTRA_CTI_THUNKS)
     }
-#else
-    UNUSED_PARAM(currentInstruction);
+#endif // ENABLE(DFG_JIT)
     UNUSED_PARAM(iter);
+    UNUSED_PARAM(instruction);
+}
+
+#if ENABLE(EXTRA_CTI_THUNKS)
+
+#if ENABLE(DFG_JIT)
+MacroAssemblerCodeRef<JITThunkPtrTag> JIT::op_loop_hint_Generator(VM& vm)
+{
+    // The thunk generated by this function can only work with the LLInt / Baseline JIT because
+    // it makes assumptions about the right globalObject being available from CallFrame::codeBlock().
+    // DFG/FTL may inline functions belonging to other globalObjects, which may not match
+    // CallFrame::codeBlock().
+    JIT jit(vm);
+
+    jit.tagReturnAddress();
+
+    constexpr GPRReg bytecodeOffsetGPR = regT7; // incoming.
+
+#if CPU(X86_64)
+    jit.push(framePointerRegister);
+#elif CPU(ARM64)
+    jit.pushPair(framePointerRegister, linkRegister);
 #endif
+
+    auto usedRegisters = RegisterSet::stubUnavailableRegisters();
+    usedRegisters.add(bytecodeOffsetGPR);
+    jit.copyLLIntBaselineCalleeSavesFromFrameOrRegisterToEntryFrameCalleeSavesBuffer(vm.topEntryFrame, usedRegisters);
+
+    jit.store32(bytecodeOffsetGPR, CCallHelpers::tagFor(CallFrameSlot::argumentCountIncludingThis));
+    jit.lshift32(TrustedImm32(BytecodeIndex::checkpointShift), bytecodeOffsetGPR);
+    jit.setupArguments<decltype(operationOptimize)>(TrustedImmPtr(&vm), bytecodeOffsetGPR);
+    jit.prepareCallOperation(vm);
+    Call operationCall = jit.call(OperationPtrTag);
+    Jump hasOptimizedEntry = jit.branchTestPtr(NonZero, returnValueGPR);
+
+#if CPU(X86_64)
+    jit.pop(framePointerRegister);
+#elif CPU(ARM64)
+    jit.popPair(framePointerRegister, linkRegister);
+#endif
+    jit.ret();
+
+    hasOptimizedEntry.link(&jit);
+#if CPU(X86_64)
+    jit.addPtr(CCallHelpers::TrustedImm32(2 * sizeof(CPURegister)), stackPointerRegister);
+#elif CPU(ARM64)
+    jit.popPair(framePointerRegister, linkRegister);
+#endif
+    if (ASSERT_ENABLED) {
+        Jump ok = jit.branchPtr(MacroAssembler::Above, returnValueGPR, TrustedImmPtr(bitwise_cast<void*>(static_cast<intptr_t>(1000))));
+        jit.abortWithReason(JITUnreasonableLoopHintJumpTarget);
+        ok.link(&jit);
+    }
+
+    jit.farJump(returnValueGPR, GPRInfo::callFrameRegister);
+
+    LinkBuffer patchBuffer(jit, GLOBAL_THUNK_ID, LinkBuffer::Profile::ExtraCTIThunk);
+    patchBuffer.link(operationCall, FunctionPtr<OperationPtrTag>(operationOptimize));
+    return FINALIZE_CODE(patchBuffer, JITThunkPtrTag, "Baseline: op_loop_hint");
 }
+#endif // ENABLE(DFG_JIT)
+#endif // !ENABLE(EXTRA_CTI_THUNKS)
 
 void JIT::emit_op_check_traps(const Instruction*)
 {

Modified: trunk/Source/_javascript_Core/jit/JITOpcodes32_64.cpp (278575 => 278576)


--- trunk/Source/_javascript_Core/jit/JITOpcodes32_64.cpp	2021-06-07 22:11:29 UTC (rev 278575)
+++ trunk/Source/_javascript_Core/jit/JITOpcodes32_64.cpp	2021-06-07 22:51:25 UTC (rev 278576)
@@ -1066,7 +1066,7 @@
     // Even though JIT code doesn't use them, we initialize our constant
     // registers to zap stale pointers, to avoid unnecessarily prolonging
     // object lifetime and increasing GC pressure.
-    for (int i = CodeBlock::llintBaselineCalleeSaveSpaceAsVirtualRegisters(); i < m_codeBlock->numVars(); ++i)
+    for (unsigned i = CodeBlock::llintBaselineCalleeSaveSpaceAsVirtualRegisters(); i < m_codeBlock->numVars(); ++i)
         emitStore(virtualRegisterForLocal(i), jsUndefined());
 
     JITSlowPathCall slowPathCall(this, currentInstruction, slow_path_enter);

Modified: trunk/Source/_javascript_Core/jit/ThunkGenerators.cpp (278575 => 278576)


--- trunk/Source/_javascript_Core/jit/ThunkGenerators.cpp	2021-06-07 22:11:29 UTC (rev 278575)
+++ trunk/Source/_javascript_Core/jit/ThunkGenerators.cpp	2021-06-07 22:51:25 UTC (rev 278576)
@@ -81,11 +81,7 @@
 {
     CCallHelpers jit;
 
-#if CPU(X86_64)
-    jit.addPtr(CCallHelpers::TrustedImm32(2 * sizeof(CPURegister)), X86Registers::esp);
-#elif CPU(ARM64)
-    jit.popPair(CCallHelpers::framePointerRegister, CCallHelpers::linkRegister);
-#endif
+    jit.addPtr(CCallHelpers::TrustedImm32(2 * sizeof(CPURegister)), CCallHelpers::stackPointerRegister);
 
     CCallHelpers::Jump continuation = jit.jump();
 
_______________________________________________
webkit-changes mailing list
[email protected]
https://lists.webkit.org/mailman/listinfo/webkit-changes

Reply via email to