Skip to site navigation (Press enter)

[webkit-changes] [206274] trunk/Source

fpizlo Thu, 22 Sep 2016 14:11:49 -0700

Title: [206274] trunk/Source

Revision: 206274
Author: fpi...@apple.com
Date: 2016-09-22 14:11:42 -0700 (Thu, 22 Sep 2016)

Log Message

Fences on x86 should be a lot cheaper
https://bugs.webkit.org/show_bug.cgi?id=162417


Reviewed by Mark Lam and Geoffrey Garen.
Source/_javascript_Core:


It turns out that:
        
    lock; orl $0, (%rsp)
        
does everything that we wanted from:
        
    mfence
        
And it's a lot faster. When I tried mfence for making object visiting concurrent-GC-TSO-
friendly, it was a 9% regression on Octane/splay. But when I tried ortop, it was neutral.
So, we should use ortop from now on.
        
This part of the change is for the JITs. MacroAssembler::memoryFence() appears to always
mean something like an acqrel fence, so it's safe to make this use ortop. Since B3's Fence
compiles to Air MemoryFence, which is just MacroAssembler::memoryFence(), this also changes
B3 codegen.

* assembler/MacroAssemblerX86Common.h:
(JSC::MacroAssemblerX86Common::memoryFence):
* assembler/X86Assembler.h:
(JSC::X86Assembler::lock):
* b3/testb3.cpp:
(JSC::B3::testX86MFence):
(JSC::B3::testX86CompilerFence):

Source/WTF:

        
It turns out that:
        
    lock; orl $0, (%rsp)
        
does everything that we wanted from:
        
    mfence
        
And it's a lot faster. When I tried mfence for making object visiting concurrent-GC-TSO-
friendly, it was a 9% regression on Octane/splay. But when I tried ortop, it was neutral.
So, we should use ortop from now on.
        
This part of the change just affects our Atomics. I also changed this in the JITs.

* wtf/Atomics.h:
(WTF::x86_ortop):
(WTF::storeLoadFence):
(WTF::x86_mfence): Deleted.

Modified Paths

trunk/Source/_javascript_Core/ChangeLog
trunk/Source/_javascript_Core/assembler/MacroAssemblerX86Common.h
trunk/Source/_javascript_Core/assembler/X86Assembler.h
trunk/Source/_javascript_Core/b3/testb3.cpp
trunk/Source/WTF/ChangeLog
trunk/Source/WTF/wtf/Atomics.h

Diff

Modified: trunk/Source/_javascript_Core/ChangeLog (206273 => 206274)


--- trunk/Source/_javascript_Core/ChangeLog	2016-09-22 21:05:46 UTC (rev 206273)
+++ trunk/Source/_javascript_Core/ChangeLog	2016-09-22 21:11:42 UTC (rev 206274)
@@ -1,3 +1,35 @@
+2016-09-22  Filip Pizlo  <fpi...@apple.com>
+
+        Fences on x86 should be a lot cheaper
+        https://bugs.webkit.org/show_bug.cgi?id=162417
+
+        Reviewed by Mark Lam and Geoffrey Garen.
+
+        It turns out that:
+        
+            lock; orl $0, (%rsp)
+        
+        does everything that we wanted from:
+        
+            mfence
+        
+        And it's a lot faster. When I tried mfence for making object visiting concurrent-GC-TSO-
+        friendly, it was a 9% regression on Octane/splay. But when I tried ortop, it was neutral.
+        So, we should use ortop from now on.
+        
+        This part of the change is for the JITs. MacroAssembler::memoryFence() appears to always
+        mean something like an acqrel fence, so it's safe to make this use ortop. Since B3's Fence
+        compiles to Air MemoryFence, which is just MacroAssembler::memoryFence(), this also changes
+        B3 codegen.
+
+        * assembler/MacroAssemblerX86Common.h:
+        (JSC::MacroAssemblerX86Common::memoryFence):
+        * assembler/X86Assembler.h:
+        (JSC::X86Assembler::lock):
+        * b3/testb3.cpp:
+        (JSC::B3::testX86MFence):
+        (JSC::B3::testX86CompilerFence):
+
 2016-09-22  Joseph Pecoraro  <pecor...@apple.com>
 
         test262: Function length should be number of parameters before parameters with default values

Modified: trunk/Source/_javascript_Core/assembler/MacroAssemblerX86Common.h (206273 => 206274)


--- trunk/Source/_javascript_Core/assembler/MacroAssemblerX86Common.h	2016-09-22 21:05:46 UTC (rev 206273)
+++ trunk/Source/_javascript_Core/assembler/MacroAssemblerX86Common.h	2016-09-22 21:11:42 UTC (rev 206274)
@@ -2629,9 +2629,12 @@
         m_assembler.nop();
     }
     
+    // We take memoryFence to mean acqrel. This has acqrel semantics on x86.
     void memoryFence()
     {
-        m_assembler.mfence();
+        // lock; orl $0, (%rsp)
+        m_assembler.lock();
+        m_assembler.orl_im(0, 0, X86Registers::esp);
     }
 
     static void replaceWithJump(CodeLocationLabel instructionStart, CodeLocationLabel destination)

Modified: trunk/Source/_javascript_Core/assembler/X86Assembler.h (206273 => 206274)


--- trunk/Source/_javascript_Core/assembler/X86Assembler.h	2016-09-22 21:05:46 UTC (rev 206273)
+++ trunk/Source/_javascript_Core/assembler/X86Assembler.h	2016-09-22 21:11:42 UTC (rev 206274)
@@ -249,6 +249,7 @@
         OP_ESCAPE_DD                    = 0xDD,
         OP_CALL_rel32                   = 0xE8,
         OP_JMP_rel32                    = 0xE9,
+        PRE_LOCK                        = 0xF0,
         PRE_SSE_F2                      = 0xF2,
         PRE_SSE_F3                      = 0xF3,
         OP_HLT                          = 0xF4,
@@ -2684,6 +2685,11 @@
         m_formatter.prefix(PRE_PREDICT_BRANCH_NOT_TAKEN);
     }
     
+    void lock()
+    {
+        m_formatter.prefix(PRE_LOCK);
+    }
+    
     void mfence()
     {
         m_formatter.threeByteOp(OP2_3BYTE_ESCAPE_AE, OP3_MFENCE);

Modified: trunk/Source/_javascript_Core/b3/testb3.cpp (206273 => 206274)


--- trunk/Source/_javascript_Core/b3/testb3.cpp	2016-09-22 21:05:46 UTC (rev 206273)
+++ trunk/Source/_javascript_Core/b3/testb3.cpp	2016-09-22 21:11:42 UTC (rev 206274)
@@ -13062,7 +13062,8 @@
     root->appendNew<Value>(proc, Return, Origin());
     
     auto code = compile(proc);
-    checkUsesInstruction(*code, "mfence");
+    checkUsesInstruction(*code, "lock or $0x0, (%rsp)");
+    checkDoesNotUseInstruction(*code, "mfence");
 }
 
 void testX86CompilerFence()
@@ -13075,6 +13076,7 @@
     root->appendNew<Value>(proc, Return, Origin());
     
     auto code = compile(proc);
+    checkDoesNotUseInstruction(*code, "lock");
     checkDoesNotUseInstruction(*code, "mfence");
 }

Modified: trunk/Source/WTF/ChangeLog (206273 => 206274)


--- trunk/Source/WTF/ChangeLog	2016-09-22 21:05:46 UTC (rev 206273)
+++ trunk/Source/WTF/ChangeLog	2016-09-22 21:11:42 UTC (rev 206274)
@@ -1,3 +1,29 @@
+2016-09-22  Filip Pizlo  <fpi...@apple.com>
+
+        Fences on x86 should be a lot cheaper
+        https://bugs.webkit.org/show_bug.cgi?id=162417
+
+        Reviewed by Mark Lam and Geoffrey Garen.
+        
+        It turns out that:
+        
+            lock; orl $0, (%rsp)
+        
+        does everything that we wanted from:
+        
+            mfence
+        
+        And it's a lot faster. When I tried mfence for making object visiting concurrent-GC-TSO-
+        friendly, it was a 9% regression on Octane/splay. But when I tried ortop, it was neutral.
+        So, we should use ortop from now on.
+        
+        This part of the change just affects our Atomics. I also changed this in the JITs.
+
+        * wtf/Atomics.h:
+        (WTF::x86_ortop):
+        (WTF::storeLoadFence):
+        (WTF::x86_mfence): Deleted.
+
 2016-09-21  Alexey Proskuryakov  <a...@apple.com>
 
         Rolling out r206244, as it caused flaky crashes on tests.

Modified: trunk/Source/WTF/wtf/Atomics.h (206273 => 206274)


--- trunk/Source/WTF/wtf/Atomics.h	2016-09-22 21:05:46 UTC (rev 206273)
+++ trunk/Source/WTF/wtf/Atomics.h	2016-09-22 21:11:42 UTC (rev 206274)
@@ -167,7 +167,7 @@
 
 #elif CPU(X86) || CPU(X86_64)
 
-inline void x86_mfence()
+inline void x86_ortop()
 {
 #if OS(WINDOWS)
     // I think that this does the equivalent of a dummy interlocked instruction,
@@ -176,13 +176,15 @@
     // investigate if that is actually better.
     MemoryBarrier();
 #else
-    asm volatile("mfence" ::: "memory");
+    // This has acqrel semantics and is much cheaper than mfence. For exampe, in the JSC GC, using
+    // mfence as a store-load fence was a 9% slow-down on Octane/splay while using this was neutral.
+    asm volatile("lock; orl $0, (%%rsp)" ::: "memory");
 #endif
 }
 
 inline void loadLoadFence() { compilerFence(); }
 inline void loadStoreFence() { compilerFence(); }
-inline void storeLoadFence() { x86_mfence(); }
+inline void storeLoadFence() { x86_ortop(); }
 inline void storeStoreFence() { compilerFence(); }
 inline void memoryBarrierAfterLock() { compilerFence(); }
 inline void memoryBarrierBeforeUnlock() { compilerFence(); }

_______________________________________________
webkit-changes mailing list
webkit-changes@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-changes