JavaScriptCore

fpizlo Mon, 21 Dec 2015 08:16:46 -0800

Title: [194331] trunk/Source/_javascript_Core

Revision: 194331
Author: [email protected]
Date: 2015-12-21 08:16:01 -0800 (Mon, 21 Dec 2015)

Log Message

B3->Air lowering incorrectly copy-propagates over ZExt32's
https://bugs.webkit.org/show_bug.cgi?id=152365


Reviewed by Benjamin Poulain.

The instruction selector thinks that Value's that return Int32's are going to always be lowered
to instructions that zero-extend the destination. But this isn't actually true. If you have an
Add32 with a destination on the stack (i.e. spilled) then it only writes 4 bytes. Then, the
filler will load 8 bytes from the stack at the point of use. So, the use of the Add32 will see
garbage in the high bits.

The fact that the spiller chose to use 8 bytes for a Tmp that gets defined by an Add32 is a
pretty sad bug, but:

- It's entirely up to the spiller to decide how many bytes to use for a Tmp, since we do not
  ascribe a type to Tmps. We could ascribe types to Tmps, but then coalescing would become
  harder. Our goal is to fix the bug while still enabling coalescing in cases like "a[i]" where
  "i" is a 32-bit integer that is computed using operations that already do zero-extension.

- More broadly, it's strange that the instruction selector decides whether a Value will be
  lowered to something that zero-extends. That's too constraining, since the most optimal
  instruction selection might involve something that doesn't zero-extend in cases of spilling, so
  the zero-extension should only happen if it's actually needed. This means that we need to
  understand which Air instructions cause zero-extensions.

- If we know which Air instructions cause zero-extensions, then we don't need the instruction
  selector to copy-propagate ZExt32's. We have copy-propagation in Air thanks to the register
  allocator.

In fact, the register allocator is exactly where all of the pieces come together. It's there that
we want to know which operations zero-extend and which don't. It also wants to know how many bits
of a Tmp each instruction reads. Armed with that information, the register allocator can emit
more optimal spill code, use less stack space for spill slots, and coalesce Move32's. As a bonus,
on X86, it replaces Move's with Move32's whenever it can. On X86, Move32 is cheaper.

This fixes a crash bug in V8/encrypt. After fixing this, I only needed two minor fixes to get
V8/encrypt to run. We're about 10% behind LLVM on steady state throughput on this test. It
appears to be mostly due to excessive spilling caused by CCall slow paths. That's fixable: we
could make CCalls on slow paths use a variant of CCallSpecial that promises not to clobber any
registers, and then have it emit spill code around the call itself. LLVM probably gets this
optimization from its live range splitting.

I tried writing a regression test. The problem is that you need garbage on the stack for this to
work, and I didn't feel like writing a flaky test. It appears that running V8/encrypt will cover
this, so we do have coverage.

* CMakeLists.txt:
* _javascript_Core.xcodeproj/project.pbxproj:
* assembler/AbstractMacroAssembler.h:
(JSC::isX86):
(JSC::isX86_64):
(JSC::optimizeForARMv7IDIVSupported):
(JSC::optimizeForX86):
(JSC::optimizeForX86_64):
* b3/B3LowerToAir.cpp:
(JSC::B3::Air::LowerToAir::highBitsAreZero):
(JSC::B3::Air::LowerToAir::shouldCopyPropagate):
(JSC::B3::Air::LowerToAir::lower):
* b3/B3PatchpointSpecial.cpp:
(JSC::B3::PatchpointSpecial::forEachArg):
* b3/B3StackmapSpecial.cpp:
(JSC::B3::StackmapSpecial::forEachArgImpl):
* b3/B3Value.h:
* b3/air/AirAllocateStack.cpp:
(JSC::B3::Air::allocateStack):
* b3/air/AirArg.cpp:
(WTF::printInternal):
* b3/air/AirArg.h:
(JSC::B3::Air::Arg::pointerWidth):
(JSC::B3::Air::Arg::isAnyUse):
(JSC::B3::Air::Arg::isColdUse):
(JSC::B3::Air::Arg::isEarlyUse):
(JSC::B3::Air::Arg::isDef):
(JSC::B3::Air::Arg::isZDef):
(JSC::B3::Air::Arg::widthForB3Type):
(JSC::B3::Air::Arg::conservativeWidth):
(JSC::B3::Air::Arg::minimumWidth):
(JSC::B3::Air::Arg::bytes):
(JSC::B3::Air::Arg::widthForBytes):
(JSC::B3::Air::Arg::Arg):
(JSC::B3::Air::Arg::forEachTmp):
* b3/air/AirCCallSpecial.cpp:
(JSC::B3::Air::CCallSpecial::forEachArg):
* b3/air/AirEliminateDeadCode.cpp:
(JSC::B3::Air::eliminateDeadCode):
* b3/air/AirFixPartialRegisterStalls.cpp:
(JSC::B3::Air::fixPartialRegisterStalls):
* b3/air/AirInst.cpp:
(JSC::B3::Air::Inst::hasArgEffects):
* b3/air/AirInst.h:
(JSC::B3::Air::Inst::forEachTmpFast):
(JSC::B3::Air::Inst::forEachTmp):
* b3/air/AirInstInlines.h:
(JSC::B3::Air::Inst::forEachTmpWithExtraClobberedRegs):
* b3/air/AirIteratedRegisterCoalescing.cpp:
* b3/air/AirLiveness.h:
(JSC::B3::Air::AbstractLiveness::AbstractLiveness):
(JSC::B3::Air::AbstractLiveness::LocalCalc::execute):
* b3/air/AirOpcode.opcodes:
* b3/air/AirSpillEverything.cpp:
(JSC::B3::Air::spillEverything):
* b3/air/AirTmpWidth.cpp: Added.
(JSC::B3::Air::TmpWidth::TmpWidth):
(JSC::B3::Air::TmpWidth::~TmpWidth):
* b3/air/AirTmpWidth.h: Added.
(JSC::B3::Air::TmpWidth::width):
(JSC::B3::Air::TmpWidth::defWidth):
(JSC::B3::Air::TmpWidth::useWidth):
(JSC::B3::Air::TmpWidth::Widths::Widths):
* b3/air/AirUseCounts.h:
(JSC::B3::Air::UseCounts::UseCounts):
* b3/air/opcode_generator.rb:
* b3/testb3.cpp:
(JSC::B3::testCheckMegaCombo):
(JSC::B3::testCheckTrickyMegaCombo):
(JSC::B3::testCheckTwoMegaCombos):
(JSC::B3::run):

Modified Paths

trunk/Source/_javascript_Core/CMakeLists.txt
trunk/Source/_javascript_Core/ChangeLog
trunk/Source/_javascript_Core/_javascript_Core.xcodeproj/project.pbxproj
trunk/Source/_javascript_Core/assembler/AbstractMacroAssembler.h
trunk/Source/_javascript_Core/b3/B3LowerToAir.cpp
trunk/Source/_javascript_Core/b3/B3PatchpointSpecial.cpp
trunk/Source/_javascript_Core/b3/B3StackmapSpecial.cpp
trunk/Source/_javascript_Core/b3/B3Value.h
trunk/Source/_javascript_Core/b3/air/AirAllocateStack.cpp
trunk/Source/_javascript_Core/b3/air/AirArg.cpp
trunk/Source/_javascript_Core/b3/air/AirArg.h
trunk/Source/_javascript_Core/b3/air/AirCCallSpecial.cpp
trunk/Source/_javascript_Core/b3/air/AirEliminateDeadCode.cpp
trunk/Source/_javascript_Core/b3/air/AirFixPartialRegisterStalls.cpp
trunk/Source/_javascript_Core/b3/air/AirInst.cpp
trunk/Source/_javascript_Core/b3/air/AirInst.h
trunk/Source/_javascript_Core/b3/air/AirInstInlines.h
trunk/Source/_javascript_Core/b3/air/AirIteratedRegisterCoalescing.cpp
trunk/Source/_javascript_Core/b3/air/AirLiveness.h
trunk/Source/_javascript_Core/b3/air/AirOpcode.opcodes
trunk/Source/_javascript_Core/b3/air/AirOptimizeBlockOrder.cpp
trunk/Source/_javascript_Core/b3/air/AirSpillEverything.cpp
trunk/Source/_javascript_Core/b3/air/AirUseCounts.h
trunk/Source/_javascript_Core/b3/air/opcode_generator.rb
trunk/Source/_javascript_Core/b3/testb3.cpp
trunk/Source/_javascript_Core/ftl/FTLLowerDFGToLLVM.cpp
trunk/Source/_javascript_Core/ftl/FTLOSRExitHandle.cpp

Added Paths

trunk/Source/_javascript_Core/b3/air/AirTmpWidth.cpp
trunk/Source/_javascript_Core/b3/air/AirTmpWidth.h

Diff

Modified: trunk/Source/_javascript_Core/CMakeLists.txt (194330 => 194331)


--- trunk/Source/_javascript_Core/CMakeLists.txt	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/CMakeLists.txt	2015-12-21 16:16:01 UTC (rev 194331)
@@ -90,6 +90,7 @@
     b3/air/AirSpillEverything.cpp
     b3/air/AirStackSlot.cpp
     b3/air/AirTmp.cpp
+    b3/air/AirTmpWidth.cpp
     b3/air/AirValidate.cpp
 
     b3/B3ArgumentRegValue.cpp

Modified: trunk/Source/_javascript_Core/ChangeLog (194330 => 194331)


--- trunk/Source/_javascript_Core/ChangeLog	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/ChangeLog	2015-12-21 16:16:01 UTC (rev 194331)
@@ -1,3 +1,123 @@
+2015-12-21  Filip Pizlo  <[email protected]>
+
+        B3->Air lowering incorrectly copy-propagates over ZExt32's
+        https://bugs.webkit.org/show_bug.cgi?id=152365
+
+        Reviewed by Benjamin Poulain.
+
+        The instruction selector thinks that Value's that return Int32's are going to always be lowered
+        to instructions that zero-extend the destination. But this isn't actually true. If you have an
+        Add32 with a destination on the stack (i.e. spilled) then it only writes 4 bytes. Then, the
+        filler will load 8 bytes from the stack at the point of use. So, the use of the Add32 will see
+        garbage in the high bits.
+
+        The fact that the spiller chose to use 8 bytes for a Tmp that gets defined by an Add32 is a
+        pretty sad bug, but:
+
+        - It's entirely up to the spiller to decide how many bytes to use for a Tmp, since we do not
+          ascribe a type to Tmps. We could ascribe types to Tmps, but then coalescing would become
+          harder. Our goal is to fix the bug while still enabling coalescing in cases like "a[i]" where
+          "i" is a 32-bit integer that is computed using operations that already do zero-extension.
+
+        - More broadly, it's strange that the instruction selector decides whether a Value will be
+          lowered to something that zero-extends. That's too constraining, since the most optimal
+          instruction selection might involve something that doesn't zero-extend in cases of spilling, so
+          the zero-extension should only happen if it's actually needed. This means that we need to
+          understand which Air instructions cause zero-extensions.
+
+        - If we know which Air instructions cause zero-extensions, then we don't need the instruction
+          selector to copy-propagate ZExt32's. We have copy-propagation in Air thanks to the register
+          allocator.
+
+        In fact, the register allocator is exactly where all of the pieces come together. It's there that
+        we want to know which operations zero-extend and which don't. It also wants to know how many bits
+        of a Tmp each instruction reads. Armed with that information, the register allocator can emit
+        more optimal spill code, use less stack space for spill slots, and coalesce Move32's. As a bonus,
+        on X86, it replaces Move's with Move32's whenever it can. On X86, Move32 is cheaper.
+
+        This fixes a crash bug in V8/encrypt. After fixing this, I only needed two minor fixes to get
+        V8/encrypt to run. We're about 10% behind LLVM on steady state throughput on this test. It
+        appears to be mostly due to excessive spilling caused by CCall slow paths. That's fixable: we
+        could make CCalls on slow paths use a variant of CCallSpecial that promises not to clobber any
+        registers, and then have it emit spill code around the call itself. LLVM probably gets this
+        optimization from its live range splitting.
+
+        I tried writing a regression test. The problem is that you need garbage on the stack for this to
+        work, and I didn't feel like writing a flaky test. It appears that running V8/encrypt will cover
+        this, so we do have coverage.
+
+        * CMakeLists.txt:
+        * _javascript_Core.xcodeproj/project.pbxproj:
+        * assembler/AbstractMacroAssembler.h:
+        (JSC::isX86):
+        (JSC::isX86_64):
+        (JSC::optimizeForARMv7IDIVSupported):
+        (JSC::optimizeForX86):
+        (JSC::optimizeForX86_64):
+        * b3/B3LowerToAir.cpp:
+        (JSC::B3::Air::LowerToAir::highBitsAreZero):
+        (JSC::B3::Air::LowerToAir::shouldCopyPropagate):
+        (JSC::B3::Air::LowerToAir::lower):
+        * b3/B3PatchpointSpecial.cpp:
+        (JSC::B3::PatchpointSpecial::forEachArg):
+        * b3/B3StackmapSpecial.cpp:
+        (JSC::B3::StackmapSpecial::forEachArgImpl):
+        * b3/B3Value.h:
+        * b3/air/AirAllocateStack.cpp:
+        (JSC::B3::Air::allocateStack):
+        * b3/air/AirArg.cpp:
+        (WTF::printInternal):
+        * b3/air/AirArg.h:
+        (JSC::B3::Air::Arg::pointerWidth):
+        (JSC::B3::Air::Arg::isAnyUse):
+        (JSC::B3::Air::Arg::isColdUse):
+        (JSC::B3::Air::Arg::isEarlyUse):
+        (JSC::B3::Air::Arg::isDef):
+        (JSC::B3::Air::Arg::isZDef):
+        (JSC::B3::Air::Arg::widthForB3Type):
+        (JSC::B3::Air::Arg::conservativeWidth):
+        (JSC::B3::Air::Arg::minimumWidth):
+        (JSC::B3::Air::Arg::bytes):
+        (JSC::B3::Air::Arg::widthForBytes):
+        (JSC::B3::Air::Arg::Arg):
+        (JSC::B3::Air::Arg::forEachTmp):
+        * b3/air/AirCCallSpecial.cpp:
+        (JSC::B3::Air::CCallSpecial::forEachArg):
+        * b3/air/AirEliminateDeadCode.cpp:
+        (JSC::B3::Air::eliminateDeadCode):
+        * b3/air/AirFixPartialRegisterStalls.cpp:
+        (JSC::B3::Air::fixPartialRegisterStalls):
+        * b3/air/AirInst.cpp:
+        (JSC::B3::Air::Inst::hasArgEffects):
+        * b3/air/AirInst.h:
+        (JSC::B3::Air::Inst::forEachTmpFast):
+        (JSC::B3::Air::Inst::forEachTmp):
+        * b3/air/AirInstInlines.h:
+        (JSC::B3::Air::Inst::forEachTmpWithExtraClobberedRegs):
+        * b3/air/AirIteratedRegisterCoalescing.cpp:
+        * b3/air/AirLiveness.h:
+        (JSC::B3::Air::AbstractLiveness::AbstractLiveness):
+        (JSC::B3::Air::AbstractLiveness::LocalCalc::execute):
+        * b3/air/AirOpcode.opcodes:
+        * b3/air/AirSpillEverything.cpp:
+        (JSC::B3::Air::spillEverything):
+        * b3/air/AirTmpWidth.cpp: Added.
+        (JSC::B3::Air::TmpWidth::TmpWidth):
+        (JSC::B3::Air::TmpWidth::~TmpWidth):
+        * b3/air/AirTmpWidth.h: Added.
+        (JSC::B3::Air::TmpWidth::width):
+        (JSC::B3::Air::TmpWidth::defWidth):
+        (JSC::B3::Air::TmpWidth::useWidth):
+        (JSC::B3::Air::TmpWidth::Widths::Widths):
+        * b3/air/AirUseCounts.h:
+        (JSC::B3::Air::UseCounts::UseCounts):
+        * b3/air/opcode_generator.rb:
+        * b3/testb3.cpp:
+        (JSC::B3::testCheckMegaCombo):
+        (JSC::B3::testCheckTrickyMegaCombo):
+        (JSC::B3::testCheckTwoMegaCombos):
+        (JSC::B3::run):
+
 2015-12-21  Andy VanWagoner  <[email protected]>
 
         [INTL] Implement String.prototype.localeCompare in ECMA-402

Modified: trunk/Source/_javascript_Core/_javascript_Core.xcodeproj/project.pbxproj (194330 => 194331)


--- trunk/Source/_javascript_Core/_javascript_Core.xcodeproj/project.pbxproj	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/_javascript_Core.xcodeproj/project.pbxproj	2015-12-21 16:16:01 UTC (rev 194331)
@@ -695,6 +695,8 @@
 		0FE0502C1AA9095600D33B33 /* VarOffset.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FE050231AA9095600D33B33 /* VarOffset.cpp */; };
 		0FE0502D1AA9095600D33B33 /* VarOffset.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FE050241AA9095600D33B33 /* VarOffset.h */; settings = {ATTRIBUTES = (Private, ); }; };
 		0FE0502F1AAA806900D33B33 /* ScopedArgumentsTable.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FE0502E1AAA806900D33B33 /* ScopedArgumentsTable.cpp */; };
+		0FE0E4AD1C24C94A002E17B6 /* AirTmpWidth.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FE0E4AB1C24C94A002E17B6 /* AirTmpWidth.cpp */; };
+		0FE0E4AE1C24C94A002E17B6 /* AirTmpWidth.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FE0E4AC1C24C94A002E17B6 /* AirTmpWidth.h */; };
 		0FE228ED1436AB2700196C48 /* Options.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FE228EB1436AB2300196C48 /* Options.h */; settings = {ATTRIBUTES = (Private, ); }; };
 		0FE228EE1436AB2C00196C48 /* Options.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FE228EA1436AB2300196C48 /* Options.cpp */; };
 		0FE254F61ABDDD2200A7C6D2 /* DFGVarargsForwardingPhase.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FE254F41ABDDD2200A7C6D2 /* DFGVarargsForwardingPhase.cpp */; };
@@ -2835,6 +2837,8 @@
 		0FE050231AA9095600D33B33 /* VarOffset.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = VarOffset.cpp; sourceTree = "<group>"; };
 		0FE050241AA9095600D33B33 /* VarOffset.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = VarOffset.h; sourceTree = "<group>"; };
 		0FE0502E1AAA806900D33B33 /* ScopedArgumentsTable.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = ScopedArgumentsTable.cpp; sourceTree = "<group>"; };
+		0FE0E4AB1C24C94A002E17B6 /* AirTmpWidth.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = AirTmpWidth.cpp; path = b3/air/AirTmpWidth.cpp; sourceTree = "<group>"; };
+		0FE0E4AC1C24C94A002E17B6 /* AirTmpWidth.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = AirTmpWidth.h; path = b3/air/AirTmpWidth.h; sourceTree = "<group>"; };
 		0FE228EA1436AB2300196C48 /* Options.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = Options.cpp; sourceTree = "<group>"; };
 		0FE228EB1436AB2300196C48 /* Options.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = Options.h; sourceTree = "<group>"; };
 		0FE254F41ABDDD2200A7C6D2 /* DFGVarargsForwardingPhase.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = DFGVarargsForwardingPhase.cpp; path = dfg/DFGVarargsForwardingPhase.cpp; sourceTree = "<group>"; };
@@ -4800,6 +4804,8 @@
 				0FEC85681BDACDC70080FF74 /* AirTmp.cpp */,
 				0FEC85691BDACDC70080FF74 /* AirTmp.h */,
 				0FEC856A1BDACDC70080FF74 /* AirTmpInlines.h */,
+				0FE0E4AB1C24C94A002E17B6 /* AirTmpWidth.cpp */,
+				0FE0E4AC1C24C94A002E17B6 /* AirTmpWidth.h */,
 				0F3730921C0D67EE00052BFA /* AirUseCounts.h */,
 				0FEC856B1BDACDC70080FF74 /* AirValidate.cpp */,
 				0FEC856C1BDACDC70080FF74 /* AirValidate.h */,
@@ -7296,6 +7302,7 @@
 				0F235BD917178E1C00690C7F /* FTLExitThunkGenerator.h in Headers */,
 				0F2B9CF719D0BAC100B1D1B5 /* FTLExitTimeObjectMaterialization.h in Headers */,
 				0F235BDB17178E1C00690C7F /* FTLExitValue.h in Headers */,
+				0FE0E4AE1C24C94A002E17B6 /* AirTmpWidth.h in Headers */,
 				A7F2996C17A0BB670010417A /* FTLFail.h in Headers */,
 				0FEA0A2C170B661900BB722C /* FTLFormattedValue.h in Headers */,
 				0FD8A31A17D51F2200CA2C40 /* FTLForOSREntryJITCode.h in Headers */,
@@ -9229,6 +9236,7 @@
 				0F766D3815AE4A1C008F363E /* StructureStubClearingWatchpoint.cpp in Sources */,
 				BCCF0D0C0EF0B8A500413C8F /* StructureStubInfo.cpp in Sources */,
 				705B41AB1A6E501E00716757 /* Symbol.cpp in Sources */,
+				0FE0E4AD1C24C94A002E17B6 /* AirTmpWidth.cpp in Sources */,
 				705B41AD1A6E501E00716757 /* SymbolConstructor.cpp in Sources */,
 				705B41AF1A6E501E00716757 /* SymbolObject.cpp in Sources */,
 				705B41B11A6E501E00716757 /* SymbolPrototype.cpp in Sources */,

Modified: trunk/Source/_javascript_Core/assembler/AbstractMacroAssembler.h (194330 => 194331)


--- trunk/Source/_javascript_Core/assembler/AbstractMacroAssembler.h	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/assembler/AbstractMacroAssembler.h	2015-12-21 16:16:01 UTC (rev 194331)
@@ -67,6 +67,15 @@
 #endif
 }
 
+inline bool isX86_64()
+{
+#if CPU(X86_64)
+    return true;
+#else
+    return false;
+#endif
+}
+
 inline bool optimizeForARMv7IDIVSupported()
 {
     return isARMv7IDIVSupported() && Options::useArchitectureSpecificOptimizations();
@@ -82,6 +91,11 @@
     return isX86() && Options::useArchitectureSpecificOptimizations();
 }
 
+inline bool optimizeForX86_64()
+{
+    return isX86_64() && Options::useArchitectureSpecificOptimizations();
+}
+
 class AllowMacroScratchRegisterUsage;
 class DisallowMacroScratchRegisterUsage;
 class LinkBuffer;

Modified: trunk/Source/_javascript_Core/b3/B3LowerToAir.cpp (194330 => 194331)


--- trunk/Source/_javascript_Core/b3/B3LowerToAir.cpp	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/B3LowerToAir.cpp	2015-12-21 16:16:01 UTC (rev 194331)
@@ -158,16 +158,12 @@
         }
     }
 
-    // NOTE: This entire mechanism could be done over Air, if we felt that this would be fast enough.
-    // For now we're assuming that it's faster to do this here, since analyzing B3 is so cheap.
     bool shouldCopyPropagate(Value* value)
     {
         switch (value->opcode()) {
         case Trunc:
         case Identity:
             return true;
-        case ZExt32:
-            return highBitsAreZero(value->child(0));
         default:
             return false;
         }
@@ -1775,11 +1771,6 @@
         }
 
         case ZExt32: {
-            if (highBitsAreZero(m_value->child(0))) {
-                ASSERT(tmp(m_value->child(0)) == tmp(m_value));
-                return;
-            }
-
             appendUnOp<Move32, Air::Oops>(m_value->child(0));
             return;
         }

Modified: trunk/Source/_javascript_Core/b3/B3PatchpointSpecial.cpp (194330 => 194331)


--- trunk/Source/_javascript_Core/b3/B3PatchpointSpecial.cpp	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/B3PatchpointSpecial.cpp	2015-12-21 16:16:01 UTC (rev 194331)
@@ -54,7 +54,7 @@
         return;
     }
 
-    callback(inst.args[1], Arg::Def, inst.origin->airType());
+    callback(inst.args[1], Arg::Def, inst.origin->airType(), inst.origin->airWidth());
     forEachArgImpl(0, 2, inst, SameAsRep, callback);
 }

Modified: trunk/Source/_javascript_Core/b3/B3StackmapSpecial.cpp (194330 => 194331)


--- trunk/Source/_javascript_Core/b3/B3StackmapSpecial.cpp	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/B3StackmapSpecial.cpp	2015-12-21 16:16:01 UTC (rev 194331)
@@ -111,8 +111,9 @@
             role = Arg::LateUse;
             break;
         }
-        
-        callback(arg, role, Arg::typeForB3Type(child.value()->type()));
+
+        Type type = child.value()->type();
+        callback(arg, role, Arg::typeForB3Type(type), Arg::widthForB3Type(type));
     }
 }

Modified: trunk/Source/_javascript_Core/b3/B3Value.h (194330 => 194331)


--- trunk/Source/_javascript_Core/b3/B3Value.h	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/B3Value.h	2015-12-21 16:16:01 UTC (rev 194331)
@@ -76,6 +76,7 @@
 
     // This is useful when lowering. Note that this is only valid for non-void values.
     Air::Arg::Type airType() const { return Air::Arg::typeForB3Type(type()); }
+    Air::Arg::Width airWidth() const { return Air::Arg::widthForB3Type(type()); }
 
     AdjacencyList& children() { return m_children; } 
     const AdjacencyList& children() const { return m_children; }

Modified: trunk/Source/_javascript_Core/b3/air/AirAllocateStack.cpp (194330 => 194331)


--- trunk/Source/_javascript_Core/b3/air/AirAllocateStack.cpp	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirAllocateStack.cpp	2015-12-21 16:16:01 UTC (rev 194331)
@@ -104,7 +104,7 @@
     for (BasicBlock* block : code) {
         for (Inst& inst : *block) {
             inst.forEachArg(
-                [&] (Arg& arg, Arg::Role role, Arg::Type) {
+                [&] (Arg& arg, Arg::Role role, Arg::Type, Arg::Width) {
                     if (role == Arg::UseAddr && arg.isStack())
                         escapingStackSlots.add(arg.stackSlot());
                 });
@@ -148,7 +148,7 @@
                 dataLog("Interfering: ", WTF::pointerListDump(localCalc.live()), "\n");
 
             inst.forEachArg(
-                [&] (Arg& arg, Arg::Role role, Arg::Type) {
+                [&] (Arg& arg, Arg::Role role, Arg::Type, Arg::Width) {
                     if (!Arg::isDef(role))
                         return;
                     if (!arg.isStack())

Modified: trunk/Source/_javascript_Core/b3/air/AirArg.cpp (194330 => 194331)


--- trunk/Source/_javascript_Core/b3/air/AirArg.cpp	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirArg.cpp	2015-12-21 16:16:01 UTC (rev 194331)
@@ -178,6 +178,12 @@
     case Arg::UseDef:
         out.print("UseDef");
         return;
+    case Arg::ZDef:
+        out.print("ZDef");
+        return;
+    case Arg::UseZDef:
+        out.print("UseZDef");
+        return;
     case Arg::UseAddr:
         out.print("UseAddr");
         return;

Modified: trunk/Source/_javascript_Core/b3/air/AirArg.h (194330 => 194331)


--- trunk/Source/_javascript_Core/b3/air/AirArg.h	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirArg.h	2015-12-21 16:16:01 UTC (rev 194331)
@@ -101,9 +101,28 @@
         // Like Use of address, Def of address does not mean escape.
         Def,
 
+        // This is a special variant of Def that implies that the upper bits of the target register are
+        // zero-filled. Specifically, if the Width of a ZDef is less than the largest possible width of
+        // the argument (for example, we're on a 64-bit machine and we have a Width32 ZDef of a GPR) then
+        // this has different implications for the upper bits (i.e. the top 32 bits in our example)
+        // depending on the kind of the argument:
+        //
+        // For register: the upper bits are zero-filled.
+        // For address: the upper bits are not touched (i.e. we do a 32-bit store in our example).
+        // For tmp: either the upper bits are not touched or they are zero-filled, and we won't know
+        // which until we lower the tmp to either a StackSlot or a Reg.
+        //
+        // The behavior of ZDef is consistent with what happens when you perform 32-bit operations on a
+        // 64-bit GPR. It's not consistent with what happens with 8-bit or 16-bit Defs on x86 GPRs, or
+        // what happens with float Defs in ARM NEON or X86 SSE. Hence why we have both Def and ZDef.
+        ZDef,
+
         // This is a combined Use and Def. It means that both things happen.
         UseDef,
 
+        // This is a combined Use and ZDef. It means that both things happen.
+        UseZDef,
+
         // This is a special kind of use that is only valid for addresses. It means that the
         // instruction will evaluate the address _expression_ and consume the effective address, but it
         // will neither load nor store. This is an escaping use, because now the address may be
@@ -126,6 +145,13 @@
         Width64
     };
 
+    static Width pointerWidth()
+    {
+        if (sizeof(void*) == 8)
+            return Width64;
+        return Width32;
+    }
+
     enum Signedness : int8_t {
         Signed,
         Unsigned
@@ -139,9 +165,11 @@
         case Use:
         case ColdUse:
         case UseDef:
+        case UseZDef:
         case LateUse:
             return true;
         case Def:
+        case ZDef:
         case UseAddr:
             return false;
         }
@@ -155,7 +183,9 @@
             return true;
         case Use:
         case UseDef:
+        case UseZDef:
         case Def:
+        case ZDef:
         case UseAddr:
             return false;
         }
@@ -173,8 +203,10 @@
         case Use:
         case ColdUse:
         case UseDef:
+        case UseZDef:
             return true;
         case Def:
+        case ZDef:
         case UseAddr:
         case LateUse:
             return false;
@@ -198,10 +230,29 @@
             return false;
         case Def:
         case UseDef:
+        case ZDef:
+        case UseZDef:
             return true;
         }
     }
 
+    // Returns true if the Role implies that the Inst will ZDef the Arg.
+    static bool isZDef(Role role)
+    {
+        switch (role) {
+        case Use:
+        case ColdUse:
+        case UseAddr:
+        case LateUse:
+        case Def:
+        case UseDef:
+            return false;
+        case ZDef:
+        case UseZDef:
+            return true;
+        }
+    }
+
     static Type typeForB3Type(B3::Type type)
     {
         switch (type) {
@@ -234,6 +285,37 @@
         }
     }
 
+    static Width conservativeWidth(Type type)
+    {
+        return type == GP ? pointerWidth() : Width64;
+    }
+
+    static Width minimumWidth(Type type)
+    {
+        return type == GP ? Width8 : Width32;
+    }
+
+    static unsigned bytes(Width width)
+    {
+        return 1 << width;
+    }
+
+    static Width widthForBytes(unsigned bytes)
+    {
+        switch (bytes) {
+        case 0:
+        case 1:
+            return Width8;
+        case 2:
+            return Width16;
+        case 3:
+        case 4:
+            return Width32;
+        default:
+            return Width64;
+        }
+    }
+
     Arg()
         : m_kind(Invalid)
     {
@@ -717,19 +799,19 @@
     //
     // This defs (%rcx) but uses %rcx.
     template<typename Functor>
-    void forEachTmp(Role argRole, Type argType, const Functor& functor)
+    void forEachTmp(Role argRole, Type argType, Width argWidth, const Functor& functor)
     {
         switch (m_kind) {
         case Tmp:
             ASSERT(isAnyUse(argRole) || isDef(argRole));
-            functor(m_base, argRole, argType);
+            functor(m_base, argRole, argType, argWidth);
             break;
         case Addr:
-            functor(m_base, Use, GP);
+            functor(m_base, Use, GP, argRole == UseAddr ? argWidth : pointerWidth());
             break;
         case Index:
-            functor(m_base, Use, GP);
-            functor(m_index, Use, GP);
+            functor(m_base, Use, GP, argRole == UseAddr ? argWidth : pointerWidth());
+            functor(m_index, Use, GP, argRole == UseAddr ? argWidth : pointerWidth());
             break;
         default:
             break;

Modified: trunk/Source/_javascript_Core/b3/air/AirCCallSpecial.cpp (194330 => 194331)


--- trunk/Source/_javascript_Core/b3/air/AirCCallSpecial.cpp	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirCCallSpecial.cpp	2015-12-21 16:16:01 UTC (rev 194331)
@@ -45,16 +45,17 @@
 void CCallSpecial::forEachArg(Inst& inst, const ScopedLambda<Inst::EachArgCallback>& callback)
 {
     for (unsigned i = 0; i < numCalleeArgs; ++i)
-        callback(inst.args[calleeArgOffset + i], Arg::Use, Arg::GP);
+        callback(inst.args[calleeArgOffset + i], Arg::Use, Arg::GP, Arg::pointerWidth());
     for (unsigned i = 0; i < numReturnGPArgs; ++i)
-        callback(inst.args[returnGPArgOffset + i], Arg::Def, Arg::GP);
+        callback(inst.args[returnGPArgOffset + i], Arg::Def, Arg::GP, Arg::pointerWidth());
     for (unsigned i = 0; i < numReturnFPArgs; ++i)
-        callback(inst.args[returnFPArgOffset + i], Arg::Def, Arg::FP);
+        callback(inst.args[returnFPArgOffset + i], Arg::Def, Arg::FP, Arg::Width64);
     
     for (unsigned i = argArgOffset; i < inst.args.size(); ++i) {
         // For the type, we can just query the arg's type. The arg will have a type, because we
         // require these args to be argument registers.
-        callback(inst.args[i], Arg::Use, inst.args[i].type());
+        Arg::Type type = inst.args[i].type();
+        callback(inst.args[i], Arg::Use, type, Arg::conservativeWidth(type));
     }
 }

Modified: trunk/Source/_javascript_Core/b3/air/AirEliminateDeadCode.cpp (194330 => 194331)


--- trunk/Source/_javascript_Core/b3/air/AirEliminateDeadCode.cpp	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirEliminateDeadCode.cpp	2015-12-21 16:16:01 UTC (rev 194331)
@@ -80,7 +80,7 @@
         // This instruction should be presumed dead, if its Args are all dead.
         bool storesToLive = false;
         inst.forEachArg(
-            [&] (Arg& arg, Arg::Role role, Arg::Type) {
+            [&] (Arg& arg, Arg::Role role, Arg::Type, Arg::Width) {
                 if (!Arg::isDef(role))
                     return;
                 storesToLive |= isArgLive(arg);

Modified: trunk/Source/_javascript_Core/b3/air/AirFixPartialRegisterStalls.cpp (194330 => 194331)


--- trunk/Source/_javascript_Core/b3/air/AirFixPartialRegisterStalls.cpp	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirFixPartialRegisterStalls.cpp	2015-12-21 16:16:01 UTC (rev 194331)
@@ -113,7 +113,7 @@
         return;
     }
 
-    inst.forEachTmp([&] (Tmp& tmp, Arg::Role role, Arg::Type) {
+    inst.forEachTmp([&] (Tmp& tmp, Arg::Role role, Arg::Type, Arg::Width) {
         ASSERT_WITH_MESSAGE(tmp.isReg(), "This phase must be run after register allocation.");
 
         if (tmp.isFPR() && Arg::isDef(role))
@@ -203,7 +203,7 @@
             if (hasPartialXmmRegUpdate(inst)) {
                 RegisterSet defs;
                 RegisterSet uses;
-                inst.forEachTmp([&] (Tmp& tmp, Arg::Role role, Arg::Type) {
+                inst.forEachTmp([&] (Tmp& tmp, Arg::Role role, Arg::Type, Arg::Width) {
                     if (tmp.isFPR()) {
                         if (Arg::isDef(role))
                             defs.set(tmp.fpr());

Modified: trunk/Source/_javascript_Core/b3/air/AirInst.cpp (194330 => 194331)


--- trunk/Source/_javascript_Core/b3/air/AirInst.cpp	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirInst.cpp	2015-12-21 16:16:01 UTC (rev 194331)
@@ -38,7 +38,7 @@
 {
     bool result = false;
     forEachArg(
-        [&] (Arg&, Arg::Role role, Arg::Type) {
+        [&] (Arg&, Arg::Role role, Arg::Type, Arg::Width) {
             if (Arg::isDef(role))
                 result = true;
         });

Modified: trunk/Source/_javascript_Core/b3/air/AirInst.h (194330 => 194331)


--- trunk/Source/_javascript_Core/b3/air/AirInst.h	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirInst.h	2015-12-21 16:16:01 UTC (rev 194331)
@@ -98,20 +98,20 @@
             arg.forEachTmpFast(functor);
     }
 
-    typedef void EachArgCallback(Arg&, Arg::Role, Arg::Type);
+    typedef void EachArgCallback(Arg&, Arg::Role, Arg::Type, Arg::Width);
     
-    // Calls the functor with (arg, role, type). This function is auto-generated by
+    // Calls the functor with (arg, role, type, width). This function is auto-generated by
     // opcode_generator.rb.
     template<typename Functor>
     void forEachArg(const Functor&);
 
-    // Calls the functor with (tmp, role, type).
+    // Calls the functor with (tmp, role, type, width).
     template<typename Functor>
     void forEachTmp(const Functor& functor)
     {
         forEachArg(
-            [&] (Arg& arg, Arg::Role role, Arg::Type type) {
-                arg.forEachTmp(role, type, functor);
+            [&] (Arg& arg, Arg::Role role, Arg::Type type, Arg::Width width) {
+                arg.forEachTmp(role, type, width, functor);
             });
     }

Modified: trunk/Source/_javascript_Core/b3/air/AirInstInlines.h (194330 => 194331)


--- trunk/Source/_javascript_Core/b3/air/AirInstInlines.h	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirInstInlines.h	2015-12-21 16:16:01 UTC (rev 194331)
@@ -55,7 +55,7 @@
     static void forEach(Inst& inst, const Functor& functor)
     {
         inst.forEachArg(
-            [&] (Arg& arg, Arg::Role role, Arg::Type type) {
+            [&] (Arg& arg, Arg::Role role, Arg::Type type, Arg::Width width) {
                 if (!arg.isStack())
                     return;
                 StackSlot* stackSlot = arg.stackSlot();
@@ -66,7 +66,7 @@
                 // semantics of "Anonymous".
                 // https://bugs.webkit.org/show_bug.cgi?id=151128
                 
-                functor(stackSlot, role, type);
+                functor(stackSlot, role, type, width);
                 arg = Arg::stack(stackSlot, arg.offset());
             });
     }
@@ -99,12 +99,13 @@
 inline void Inst::forEachTmpWithExtraClobberedRegs(Inst* nextInst, const Functor& functor)
 {
     forEachTmp(
-        [&] (Tmp& tmpArg, Arg::Role role, Arg::Type argType) {
-            functor(tmpArg, role, argType);
+        [&] (Tmp& tmpArg, Arg::Role role, Arg::Type argType, Arg::Width argWidth) {
+            functor(tmpArg, role, argType, argWidth);
         });
 
     auto reportReg = [&] (Reg reg) {
-        functor(Tmp(reg), Arg::Def, reg.isGPR() ? Arg::GP : Arg::FP);
+        Arg::Type type = reg.isGPR() ? Arg::GP : Arg::FP;
+        functor(Tmp(reg), Arg::Def, type, Arg::conservativeWidth(type));
     };
 
     if (hasSpecial())

Modified: trunk/Source/_javascript_Core/b3/air/AirIteratedRegisterCoalescing.cpp (194330 => 194331)


--- trunk/Source/_javascript_Core/b3/air/AirIteratedRegisterCoalescing.cpp	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirIteratedRegisterCoalescing.cpp	2015-12-21 16:16:01 UTC (rev 194331)
@@ -35,6 +35,7 @@
 #include "AirPhaseScope.h"
 #include "AirRegisterPriority.h"
 #include "AirTmpInlines.h"
+#include "AirTmpWidth.h"
 #include "AirUseCounts.h"
 #include <wtf/ListDump.h>
 #include <wtf/ListHashSet.h>
@@ -630,9 +631,10 @@
 template<Arg::Type type>
 class ColoringAllocator : public AbstractColoringAllocator<unsigned> {
 public:
-    ColoringAllocator(Code& code, const UseCounts<Tmp>& useCounts, const HashSet<unsigned>& unspillableTmp)
+    ColoringAllocator(Code& code, TmpWidth& tmpWidth, const UseCounts<Tmp>& useCounts, const HashSet<unsigned>& unspillableTmp)
         : AbstractColoringAllocator<unsigned>(regsInPriorityOrder(type), AbsoluteTmpMapper<type>::lastMachineRegisterIndex(), tmpArraySize(code))
         , m_code(code)
+        , m_tmpWidth(tmpWidth)
         , m_useCounts(useCounts)
         , m_unspillableTmps(unspillableTmp)
     {
@@ -646,9 +648,17 @@
         return AbsoluteTmpMapper<type>::tmpFromAbsoluteIndex(getAlias(AbsoluteTmpMapper<type>::absoluteIndex(tmp)));
     }
 
+    // This tells you if a Move will be coalescable if the src and dst end up matching. This method
+    // relies on an analysis that is invalidated by register allocation, so you it's only meaningful to
+    // call this *before* replacing the Tmp's in this Inst with registers or spill slots.
+    bool mayBeCoalescable(const Inst& inst) const
+    {
+        return mayBeCoalescableImpl(inst, &m_tmpWidth);
+    }
+
     bool isUselessMove(const Inst& inst) const
     {
-        return mayBeCoalescable(inst) && inst.args[0].tmp() == inst.args[1].tmp();
+        return mayBeCoalescableImpl(inst, nullptr) && inst.args[0].tmp() == inst.args[1].tmp();
     }
 
     Tmp getAliasWhenSpilling(Tmp tmp) const
@@ -770,14 +780,14 @@
     {
         inst.forEachTmpWithExtraClobberedRegs(
             nextInst,
-            [&] (const Tmp& arg, Arg::Role role, Arg::Type argType) {
+            [&] (const Tmp& arg, Arg::Role role, Arg::Type argType, Arg::Width) {
                 if (!Arg::isDef(role) || argType != type)
                     return;
                 
                 // All the Def()s interfere with each other and with all the extra clobbered Tmps.
                 // We should not use forEachDefAndExtraClobberedTmp() here since colored Tmps
                 // do not need interference edges in our implementation.
-                inst.forEachTmp([&] (Tmp& otherArg, Arg::Role role, Arg::Type argType) {
+                inst.forEachTmp([&] (Tmp& otherArg, Arg::Role role, Arg::Type argType, Arg::Width) {
                     if (!Arg::isDef(role) || argType != type)
                         return;
 
@@ -791,7 +801,7 @@
             // coalesce the Move even if the two Tmp never interfere anywhere.
             Tmp defTmp;
             Tmp useTmp;
-            inst.forEachTmp([&defTmp, &useTmp] (Tmp& argTmp, Arg::Role role, Arg::Type) {
+            inst.forEachTmp([&defTmp, &useTmp] (Tmp& argTmp, Arg::Role role, Arg::Type, Arg::Width) {
                 if (Arg::isDef(role))
                     defTmp = argTmp;
                 else {
@@ -839,7 +849,7 @@
         // All the Def()s interfere with everthing live.
         inst.forEachTmpWithExtraClobberedRegs(
             nextInst,
-            [&] (const Tmp& arg, Arg::Role role, Arg::Type argType) {
+            [&] (const Tmp& arg, Arg::Role role, Arg::Type argType, Arg::Width) {
                 if (!Arg::isDef(role) || argType != type)
                     return;
                 
@@ -857,12 +867,15 @@
         addEdge(AbsoluteTmpMapper<type>::absoluteIndex(a), AbsoluteTmpMapper<type>::absoluteIndex(b));
     }
 
-    bool mayBeCoalescable(const Inst& inst) const
+    // Calling this without a tmpWidth will perform a more conservative coalescing analysis that assumes
+    // that Move32's are not coalescable.
+    static bool mayBeCoalescableImpl(const Inst& inst, TmpWidth* tmpWidth)
     {
         switch (type) {
         case Arg::GP:
             switch (inst.opcode) {
             case Move:
+            case Move32:
                 break;
             default:
                 return false;
@@ -887,6 +900,22 @@
         ASSERT(inst.args[0].type() == type);
         ASSERT(inst.args[1].type() == type);
 
+        // We can coalesce a Move32 so long as either of the following holds:
+        // - The input is already zero-filled.
+        // - The output only cares about the low 32 bits.
+        //
+        // Note that the input property requires an analysis over ZDef's, so it's only valid so long
+        // as the input gets a register. We don't know if the input gets a register, but we do know
+        // that if it doesn't get a register then we will still emit this Move32.
+        if (inst.opcode == Move32) {
+            if (!tmpWidth)
+                return false;
+
+            if (tmpWidth->defWidth(inst.args[0].tmp()) > Arg::Width32
+                && tmpWidth->useWidth(inst.args[1].tmp()) > Arg::Width32)
+                return false;
+        }
+        
         return true;
     }
 
@@ -1024,6 +1053,7 @@
     using AbstractColoringAllocator<unsigned>::getAlias;
 
     Code& m_code;
+    TmpWidth& m_tmpWidth;
     // FIXME: spilling should not type specific. It is only a side effect of using UseCounts.
     const UseCounts<Tmp>& m_useCounts;
     const HashSet<unsigned>& m_unspillableTmps;
@@ -1053,7 +1083,23 @@
         HashSet<unsigned> unspillableTmps;
         while (true) {
             ++m_numIterations;
-            ColoringAllocator<type> allocator(m_code, m_useCounts, unspillableTmps);
+
+            // FIXME: One way to optimize this code is to remove the recomputation inside the fixpoint.
+            // We need to recompute because spilling adds tmps, but we could just update tmpWidth when we
+            // add those tmps. Note that one easy way to remove the recomputation is to make any newly
+            // added Tmps get the same use/def widths that the original Tmp got. But, this may hurt the
+            // spill code we emit. Since we currently recompute TmpWidth after spilling, the newly
+            // created Tmps may get narrower use/def widths. On the other hand, the spiller already
+            // selects which move instruction to use based on the original Tmp's widths, so it may not
+            // matter than a subsequent iteration sees a coservative width for the new Tmps. Also, the
+            // recomputation may not actually be a performance problem; it's likely that a better way to
+            // improve performance of TmpWidth is to replace its HashMap with something else. It's
+            // possible that most of the TmpWidth overhead is from queries of TmpWidth rather than the
+            // recomputation, in which case speeding up the lookup would be a bigger win.
+            // https://bugs.webkit.org/show_bug.cgi?id=152478
+            m_tmpWidth.recompute(m_code);
+            
+            ColoringAllocator<type> allocator(m_code, m_tmpWidth, m_useCounts, unspillableTmps);
             if (!allocator.requiresSpilling()) {
                 assignRegistersToTmp(allocator);
                 return;
@@ -1069,6 +1115,22 @@
             // Give Tmp a valid register.
             for (unsigned instIndex = 0; instIndex < block->size(); ++instIndex) {
                 Inst& inst = block->at(instIndex);
+
+                // The mayBeCoalescable() method will change its mind for some operations after we
+                // complete register allocation. So, we record this before starting.
+                bool mayBeCoalescable = allocator.mayBeCoalescable(inst);
+
+                // On X86_64, Move32 is cheaper if we know that it's equivalent to a Move. It's
+                // equivalent if the destination's high bits are not observable or if the source's high
+                // bits are all zero. Note that we don't have the opposite optimization for other
+                // architectures, which may prefer Move over Move32, because Move is canonical already.
+                if (type == Arg::GP && optimizeForX86_64() && inst.opcode == Move
+                    && inst.args[0].isTmp() && inst.args[1].isTmp()) {
+                    if (m_tmpWidth.useWidth(inst.args[1].tmp()) <= Arg::Width32
+                        || m_tmpWidth.defWidth(inst.args[0].tmp()) <= Arg::Width32)
+                        inst.opcode = Move32;
+                }
+
                 inst.forEachTmpFast([&] (Tmp& tmp) {
                     if (tmp.isReg() || tmp.isGP() == (type != Arg::GP))
                         return;
@@ -1085,11 +1147,15 @@
                     ASSERT(assignedTmp.isReg());
                     tmp = assignedTmp;
                 });
+
+                if (mayBeCoalescable && inst.args[0].isTmp() && inst.args[1].isTmp()
+                    && inst.args[0].tmp() == inst.args[1].tmp())
+                    inst = Inst();
             }
 
             // Remove all the useless moves we created in this block.
             block->insts().removeAllMatching([&] (const Inst& inst) {
-                return allocator.isUselessMove(inst);
+                return !inst;
             });
         }
     }
@@ -1103,7 +1169,9 @@
             unspillableTmps.add(AbsoluteTmpMapper<type>::absoluteIndex(tmp));
 
             // Allocate stack slot for each spilled value.
-            bool isNewTmp = stackSlots.add(tmp, m_code.addStackSlot(8, StackSlotKind::Anonymous)).isNewEntry;
+            StackSlot* stackSlot = m_code.addStackSlot(
+                m_tmpWidth.width(tmp) <= Arg::Width32 ? 4 : 8, StackSlotKind::Anonymous);
+            bool isNewTmp = stackSlots.add(tmp, stackSlot).isNewEntry;
             ASSERT_UNUSED(isNewTmp, isNewTmp);
         }
 
@@ -1115,18 +1183,36 @@
             for (unsigned instIndex = 0; instIndex < block->size(); ++instIndex) {
                 Inst& inst = block->at(instIndex);
 
+                // The TmpWidth analysis will say that a Move only stores 32 bits into the destination,
+                // if the source only had 32 bits worth of non-zero bits. Same for the source: it will
+                // only claim to read 32 bits from the source if only 32 bits of the destination are
+                // read. Note that we only apply this logic if this turns into a load or store, since
+                // Move is the canonical way to move data between GPRs.
+                bool forceMove32IfDidSpill = false;
+                bool didSpill = false;
+                if (type == Arg::GP && inst.opcode == Move) {
+                    if (m_tmpWidth.defWidth(inst.args[0].tmp()) <= Arg::Width32
+                        || m_tmpWidth.useWidth(inst.args[1].tmp()) <= Arg::Width32)
+                        forceMove32IfDidSpill = true;
+                }
+
                 // Try to replace the register use by memory use when possible.
                 for (unsigned i = 0; i < inst.args.size(); ++i) {
                     Arg& arg = inst.args[i];
                     if (arg.isTmp() && arg.type() == type && !arg.isReg()) {
                         auto stackSlotEntry = stackSlots.find(arg.tmp());
-                        if (stackSlotEntry != stackSlots.end() && inst.admitsStack(i))
+                        if (stackSlotEntry != stackSlots.end() && inst.admitsStack(i)) {
                             arg = Arg::stack(stackSlotEntry->value);
+                            didSpill = true;
+                        }
                     }
                 }
 
+                if (didSpill && forceMove32IfDidSpill)
+                    inst.opcode = Move32;
+
                 // For every other case, add Load/Store as needed.
-                inst.forEachTmp([&] (Tmp& tmp, Arg::Role role, Arg::Type argType) {
+                inst.forEachTmp([&] (Tmp& tmp, Arg::Role role, Arg::Type argType, Arg::Width) {
                     if (tmp.isReg() || argType != type)
                         return;
 
@@ -1141,7 +1227,18 @@
                     }
 
                     Arg arg = Arg::stack(stackSlotEntry->value);
-                    Opcode move = type == Arg::GP ? Move : MoveDouble;
+                    Opcode move = Oops;
+                    switch (stackSlotEntry->value->byteSize()) {
+                    case 4:
+                        move = type == Arg::GP ? Move32 : MoveFloat;
+                        break;
+                    case 8:
+                        move = type == Arg::GP ? Move : MoveDouble;
+                        break;
+                    default:
+                        RELEASE_ASSERT_NOT_REACHED();
+                        break;
+                    }
 
                     if (Arg::isAnyUse(role)) {
                         Tmp newTmp = m_code.newTmp(type);
@@ -1166,6 +1263,7 @@
     }
 
     Code& m_code;
+    TmpWidth m_tmpWidth;
     UseCounts<Tmp> m_useCounts;
     unsigned m_numIterations { 0 };
 };

Modified: trunk/Source/_javascript_Core/b3/air/AirLiveness.h (194330 => 194331)


--- trunk/Source/_javascript_Core/b3/air/AirLiveness.h	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirLiveness.h	2015-12-21 16:16:01 UTC (rev 194331)
@@ -90,7 +90,7 @@
             typename Adapter::IndexSet& liveAtTail = m_liveAtTail[block];
 
             block->last().forEach<typename Adapter::Thing>(
-                [&] (typename Adapter::Thing& thing, Arg::Role role, Arg::Type type) {
+                [&] (typename Adapter::Thing& thing, Arg::Role role, Arg::Type type, Arg::Width) {
                     if (Arg::isLateUse(role) && Adapter::acceptsType(type))
                         liveAtTail.add(Adapter::valueToIndex(thing));
                 });
@@ -216,14 +216,14 @@
             auto& workset = m_liveness.m_workset;
             // First handle def's.
             inst.forEach<typename Adapter::Thing>(
-                [&] (typename Adapter::Thing& thing, Arg::Role role, Arg::Type type) {
+                [&] (typename Adapter::Thing& thing, Arg::Role role, Arg::Type type, Arg::Width) {
                     if (Arg::isDef(role) && Adapter::acceptsType(type))
                         workset.remove(Adapter::valueToIndex(thing));
                 });
 
             // Then handle use's.
             inst.forEach<typename Adapter::Thing>(
-                [&] (typename Adapter::Thing& thing, Arg::Role role, Arg::Type type) {
+                [&] (typename Adapter::Thing& thing, Arg::Role role, Arg::Type type, Arg::Width) {
                     if (Arg::isEarlyUse(role) && Adapter::acceptsType(type))
                         workset.add(Adapter::valueToIndex(thing));
                 });
@@ -232,7 +232,7 @@
             if (instIndex) {
                 Inst& prevInst = m_block->at(instIndex - 1);
                 prevInst.forEach<typename Adapter::Thing>(
-                    [&] (typename Adapter::Thing& thing, Arg::Role role, Arg::Type type) {
+                    [&] (typename Adapter::Thing& thing, Arg::Role role, Arg::Type type, Arg::Width) {
                         if (Arg::isLateUse(role) && Adapter::acceptsType(type))
                             workset.add(Adapter::valueToIndex(thing));
                     });

Modified: trunk/Source/_javascript_Core/b3/air/AirOpcode.opcodes (194330 => 194331)


--- trunk/Source/_javascript_Core/b3/air/AirOpcode.opcodes	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirOpcode.opcodes	2015-12-21 16:16:01 UTC (rev 194331)
@@ -23,14 +23,17 @@
 
 # Syllabus:
 #
-# Roles and types:
-# U:G => use of a general-purpose register or value
-# D:G => def of a general-purpose register or value
-# UD:G => use and def of a general-purpose register or value
-# UA:G => UseAddr (see comment in Arg.h)
-# U:F => use of a float register or value
-# D:F => def of a float register or value
-# UD:F => use and def of a float register or value
+# Examples of some roles, types, and widths:
+# U:G:32 => use of the low 32 bits of a general-purpose register or value
+# D:G:32 => def of the low 32 bits of a general-purpose register or value
+# UD:G:32 => use and def of the low 32 bits of a general-purpose register or value
+# U:G:64 => use of the low 64 bits of a general-purpose register or value
+# ZD:G:32 => def of all bits of a general-purpose register, where all but the low 32 bits are guaranteed to be zeroed.
+# UA:G:Ptr => UseAddr (see comment in Arg.h)
+# U:F:32 => use of a float register or value
+# U:F:64 => use of a double register or value
+# D:F:32 => def of a float register or value
+# UD:F:32 => use and def of a float register or value
 #
 # Argument kinds:
 # Tmp => temporary or register
@@ -44,11 +47,11 @@
 # of things. So, although this file uses a particular indentation style, none of the whitespace or
 # even newlines are meaningful to the parser. For example, you could write:
 #
-# Foo42 U:G, UD:F Imm, Tmp Addr, Tmp
+# Foo42 U:G:32, UD:F:32 Imm, Tmp Addr, Tmp
 #
 # And the parser would know that this is the same as:
 #
-# Foo42 U:G, UD:F
+# Foo42 U:G:32, UD:F:32
 #     Imm, Tmp
 #     Addr, Tmp
 #
@@ -58,22 +61,22 @@
 # union of those architectures. For example, if this is the only overload of the opcode, then it makes the
 # opcode only available on x86_64:
 #
-# x86_64: Fuzz UD:G, D:G
+# x86_64: Fuzz UD:G:64, D:G:64
 #     Tmp, Tmp
 #     Tmp, Addr
 #
 # But this only restricts the two-operand form, the other form is allowed on all architectures:
 #
-# x86_64: Fuzz UD:G, D:G
+# x86_64: Fuzz UD:G:64, D:G:64
 #     Tmp, Tmp
 #     Tmp, Addr
-# Fuzz UD:G, D:G, U:F
+# Fuzz UD:G:Ptr, D:G:Ptr, U:F:Ptr
 #     Tmp, Tmp, Tmp
 #     Tmp, Addr, Tmp
 #
 # And you can also restrict individual forms:
 #
-# Thingy UD:G, D:G
+# Thingy UD:G:32, D:G:32
 #     Tmp, Tmp
 #     arm64: Tmp, Addr
 #
@@ -81,7 +84,7 @@
 # form. In this example, the version that takes an address is only available on armv7 while the other
 # versions are available on armv7 or x86_64:
 #
-# x86_64 armv7: Buzz U:G, UD:F
+# x86_64 armv7: Buzz U:G:32, UD:F:32
 #     Tmp, Tmp
 #     Imm, Tmp
 #     armv7: Addr, Tmp
@@ -103,214 +106,214 @@
 
 Nop
 
-Add32 U:G, UD:G
+Add32 U:G:32, UZD:G:32
     Tmp, Tmp
     x86: Imm, Addr
     Imm, Tmp
     x86: Addr, Tmp
     x86: Tmp, Addr
 
-Add32 U:G, U:G, D:G
+Add32 U:G:32, U:G:32, ZD:G:32
     Imm, Tmp, Tmp
     Tmp, Tmp, Tmp
 
-64: Add64 U:G, UD:G
+64: Add64 U:G:64, UD:G:64
     Tmp, Tmp
     x86: Imm, Addr
     Imm, Tmp
     x86: Addr, Tmp
     x86: Tmp, Addr
 
-64: Add64 U:G, U:G, D:G
+64: Add64 U:G:64, U:G:64, D:G:64
     Imm, Tmp, Tmp
     Tmp, Tmp, Tmp
 
-AddDouble U:F, UD:F
+AddDouble U:F:64, UD:F:64
     Tmp, Tmp
     x86: Addr, Tmp
 
-AddFloat U:F, UD:F
+AddFloat U:F:32, UD:F:32
     Tmp, Tmp
     x86: Addr, Tmp
 
-Sub32 U:G, UD:G
+Sub32 U:G:32, UZD:G:32
     Tmp, Tmp
     x86: Imm, Addr
     Imm, Tmp
     x86: Addr, Tmp
     x86: Tmp, Addr
 
-64: Sub64 U:G, UD:G
+64: Sub64 U:G:64, UD:G:64
     Tmp, Tmp
     x86: Imm, Addr
     Imm, Tmp
     x86: Addr, Tmp
     x86: Tmp, Addr
 
-SubDouble U:F, UD:F
+SubDouble U:F:64, UD:F:64
     Tmp, Tmp
     x86: Addr, Tmp
 
-SubFloat U:F, UD:F
+SubFloat U:F:32, UD:F:32
     Tmp, Tmp
     x86: Addr, Tmp
 
-Neg32 UD:G
+Neg32 UZD:G:32
     Tmp
     Addr
 
-64: Neg64 UD:G
+64: Neg64 UD:G:64
     Tmp
 
-Mul32 U:G, UD:G
+Mul32 U:G:32, UZD:G:32
     Tmp, Tmp
     x86: Addr, Tmp
 
-Mul32 U:G, U:G, D:G
+Mul32 U:G:32, U:G:32, ZD:G:32
     Imm, Tmp, Tmp
 
-64: Mul64 U:G, UD:G
+64: Mul64 U:G:64, UD:G:64
     Tmp, Tmp
 
-MulDouble U:F, UD:F
+MulDouble U:F:64, UD:F:64
     Tmp, Tmp
     x86: Addr, Tmp
 
-MulFloat U:F, UD:F
+MulFloat U:F:32, UD:F:32
     Tmp, Tmp
     x86: Addr, Tmp
 
-DivDouble U:F, UD:F
+DivDouble U:F:32, UD:F:32
     Tmp, Tmp
     x86: Addr, Tmp
 
-DivFloat U:F, UD:F
+DivFloat U:F:32, UD:F:32
     Tmp, Tmp
     x86: Addr, Tmp
 
-x86: X86ConvertToDoubleWord32 U:G, D:G
+x86: X86ConvertToDoubleWord32 U:G:32, ZD:G:32
     Tmp*, Tmp*
 
-x86_64: X86ConvertToQuadWord64 U:G, D:G
+x86_64: X86ConvertToQuadWord64 U:G:64, D:G:64
     Tmp*, Tmp*
 
-x86: X86Div32 UD:G, UD:G, U:G
+x86: X86Div32 UZD:G:32, UZD:G:32, U:G:32
     Tmp*, Tmp*, Tmp
 
-x86_64: X86Div64 UD:G, UD:G, U:G
+x86_64: X86Div64 UZD:G:64, UZD:G:64, U:G:64
     Tmp*, Tmp*, Tmp
 
-Lea UA:G, D:G
+Lea UA:G:Ptr, D:G:Ptr
     Addr, Tmp
 
-And32 U:G, UD:G
+And32 U:G:32, UZD:G:32
     Tmp, Tmp
     Imm, Tmp
     x86: Tmp, Addr
     x86: Addr, Tmp
     x86: Imm, Addr
 
-64: And64 U:G, UD:G
+64: And64 U:G:64, UD:G:64
     Tmp, Tmp
     Imm, Tmp
 
-AndDouble U:F, UD:F
+AndDouble U:F:64, UD:F:64
     Tmp, Tmp
 
-AndFloat U:F, UD:F
+AndFloat U:F:32, UD:F:32
     Tmp, Tmp
 
-Lshift32 U:G, UD:G
+Lshift32 U:G:32, UZD:G:32
     Tmp*, Tmp
     Imm, Tmp
 
-64: Lshift64 U:G, UD:G
+64: Lshift64 U:G:64, UD:G:64
     Tmp*, Tmp
     Imm, Tmp
 
-Rshift32 U:G, UD:G
+Rshift32 U:G:32, UZD:G:32
     Tmp*, Tmp
     Imm, Tmp
 
-64: Rshift64 U:G, UD:G
+64: Rshift64 U:G:64, UD:G:64
     Tmp*, Tmp
     Imm, Tmp
 
-Urshift32 U:G, UD:G
+Urshift32 U:G:32, UZD:G:32
     Tmp*, Tmp
     Imm, Tmp
 
-64: Urshift64 U:G, UD:G
+64: Urshift64 U:G:64, UD:G:64
     Tmp*, Tmp
     Imm, Tmp
 
-Or32 U:G, UD:G
+Or32 U:G:32, UZD:G:32
     Tmp, Tmp
     Imm, Tmp
     x86: Tmp, Addr
     x86: Addr, Tmp
     x86: Imm, Addr
 
-64: Or64 U:G, UD:G
+64: Or64 U:G:64, UD:G:64
     Tmp, Tmp
     Imm, Tmp
 
-Xor32 U:G, UD:G
+Xor32 U:G:32, UZD:G:32
     Tmp, Tmp
     Imm, Tmp
     x86: Tmp, Addr
     x86: Addr, Tmp
     x86: Imm, Addr
 
-64: Xor64 U:G, UD:G
+64: Xor64 U:G:64, UD:G:64
     Tmp, Tmp
     x86: Tmp, Addr
     Imm, Tmp
 
-Not32 UD:G
+Not32 UZD:G:32
     Tmp
     x86: Addr
 
-64: Not64 UD:G
+64: Not64 UD:G:64
     Tmp
     x86: Addr
 
-CeilDouble U:F, UD:F
+CeilDouble U:F:64, UD:F:64
     Tmp, Tmp
     Addr, Tmp
 
-CeilFloat U:F, UD:F
+CeilFloat U:F:32, UD:F:32
     Tmp, Tmp
     Addr, Tmp
 
-SqrtDouble U:F, UD:F
+SqrtDouble U:F:64, UD:F:64
     Tmp, Tmp
     x86: Addr, Tmp
 
-SqrtFloat U:F, UD:F
+SqrtFloat U:F:32, UD:F:32
     Tmp, Tmp
     x86: Addr, Tmp
 
-ConvertInt32ToDouble U:G, D:F
+ConvertInt32ToDouble U:G:32, D:F:64
     Tmp, Tmp
     x86: Addr, Tmp
 
-64: ConvertInt64ToDouble U:G, D:F
+64: ConvertInt64ToDouble U:G:64, D:F:64
     Tmp, Tmp
 
-CountLeadingZeros32 U:G, D:G
+CountLeadingZeros32 U:G:32, ZD:G:32
     Tmp, Tmp
     x86: Addr, Tmp
 
-64: CountLeadingZeros64 U:G, D:G
+64: CountLeadingZeros64 U:G:64, D:G:64
     Tmp, Tmp
     x86: Addr, Tmp
 
-ConvertDoubleToFloat U:F, D:F
+ConvertDoubleToFloat U:F:64, D:F:32
     Tmp, Tmp
     x86: Addr, Tmp
 
-ConvertFloatToDouble U:F, D:F
+ConvertFloatToDouble U:F:32, D:F:64
     Tmp, Tmp
     x86: Addr, Tmp
 
@@ -318,7 +321,7 @@
 # the platform. I'm not entirely sure that this is a good thing; it might be better to just have a
 # Move64 instruction. OTOH, our MacroAssemblers already have this notion of "move()" that basically
 # means movePtr.
-Move U:G, D:G
+Move U:G:Ptr, D:G:Ptr
     Tmp, Tmp
     Imm, Tmp as signExtend32ToPtr
     Imm64, Tmp
@@ -328,7 +331,7 @@
     Tmp, Index as storePtr
     Imm, Addr as storePtr
 
-Move32 U:G, D:G
+Move32 U:G:32, ZD:G:32
     Tmp, Tmp as zeroExtend32ToPtr
     Addr, Tmp as load32
     Index, Tmp as load32
@@ -337,118 +340,118 @@
     Imm, Addr as store32
     Imm, Index as store32
 
-SignExtend32ToPtr U:G, D:G
+SignExtend32ToPtr U:G:32, D:G:Ptr
     Tmp, Tmp
 
-ZeroExtend8To32 U:G, D:G
+ZeroExtend8To32 U:G:8, ZD:G:32
     Tmp, Tmp
     Addr, Tmp as load8
     Index, Tmp as load8
 
-SignExtend8To32 U:G, D:G
+SignExtend8To32 U:G:8, ZD:G:32
     Tmp, Tmp
     x86: Addr, Tmp as load8SignedExtendTo32
     Index, Tmp as load8SignedExtendTo32
 
-ZeroExtend16To32 U:G, D:G
+ZeroExtend16To32 U:G:16, ZD:G:32
     Tmp, Tmp
     Addr, Tmp as load16
     Index, Tmp as load16
 
-SignExtend16To32 U:G, D:G
+SignExtend16To32 U:G:16, ZD:G:32
     Tmp, Tmp
     Addr, Tmp as load16SignedExtendTo32
     Index, Tmp as load16SignedExtendTo32
 
-MoveFloat U:F, D:F
+MoveFloat U:F:32, D:F:32
     Tmp, Tmp as moveDouble
     Addr, Tmp as loadFloat
     Index, Tmp as loadFloat
     Tmp, Addr as storeFloat
     Tmp, Index as storeFloat
 
-MoveDouble U:F, D:F
+MoveDouble U:F:64, D:F:64
     Tmp, Tmp
     Addr, Tmp as loadDouble
     Index, Tmp as loadDouble
     Tmp, Addr as storeDouble
     Tmp, Index as storeDouble
 
-MoveZeroToDouble D:F
+MoveZeroToDouble D:F:64
     Tmp
 
-64: Move64ToDouble U:G, D:F
+64: Move64ToDouble U:G:64, D:F:64
     Tmp, Tmp
     Addr, Tmp as loadDouble
     Index, Tmp as loadDouble
 
-MoveInt32ToPacked U:G, D:F
+MoveInt32ToPacked U:G:32, D:F:32
     Tmp, Tmp
     Addr, Tmp as loadFloat
     Index, Tmp as loadFloat
 
-64: MoveDoubleTo64 U:F, D:G
+64: MoveDoubleTo64 U:F:64, D:G:64
     Tmp, Tmp
     Addr, Tmp as load64
     Index, Tmp as load64
 
-MovePackedToInt32 U:F, D:G
+MovePackedToInt32 U:F:32, D:G:32
     Tmp, Tmp
     Addr, Tmp as load32
     Index, Tmp as load32
 
-Load8 U:G, D:G
+Load8 U:G:8, ZD:G:32
     Addr, Tmp
     Index, Tmp
 
-Store8 U:G, D:G
+Store8 U:G:8, D:G:8
     Tmp, Index
     Tmp, Addr
     Imm, Index
     Imm, Addr
 
-Load8SignedExtendTo32 U:G, D:G
+Load8SignedExtendTo32 U:G:8, ZD:G:32
     Addr, Tmp
     Index, Tmp
 
-Load16 U:G, D:G
+Load16 U:G:16, ZD:G:32
     Addr, Tmp
     Index, Tmp
 
-Load16SignedExtendTo32 U:G, D:G
+Load16SignedExtendTo32 U:G:16, ZD:G:32
     Addr, Tmp
     Index, Tmp
 
-Compare32 U:G, U:G, U:G, D:G
+Compare32 U:G:32, U:G:32, U:G:32, ZD:G:32
     RelCond, Tmp, Tmp, Tmp
     RelCond, Tmp, Imm, Tmp
 
-64: Compare64 U:G, U:G, U:G, D:G
+64: Compare64 U:G:32, U:G:64, U:G:64, ZD:G:32
     RelCond, Tmp, Imm, Tmp
     RelCond, Tmp, Tmp, Tmp
 
-Test32 U:G, U:G, U:G, D:G
+Test32 U:G:32, U:G:32, U:G:32, ZD:G:32
     x86: ResCond, Addr, Imm, Tmp
     ResCond, Tmp, Tmp, Tmp
 
-64: Test64 U:G, U:G, U:G, D:G
+64: Test64 U:G:32, U:G:64, U:G:64, ZD:G:32
     ResCond, Tmp, Imm, Tmp
     ResCond, Tmp, Tmp, Tmp
 
-CompareDouble U:G, U:F, U:F, D:G
+CompareDouble U:G:32, U:F:64, U:F:64, ZD:G:32
     DoubleCond, Tmp, Tmp, Tmp
 
-CompareFloat U:G, U:F, U:F, D:G
+CompareFloat U:G:32, U:F:32, U:F:32, ZD:G:32
     DoubleCond, Tmp, Tmp, Tmp
 
 # Note that branches have some logic in AirOptimizeBlockOrder.cpp. If you add new branches, please make sure
 # you opt them into the block order optimizations.
 
-Branch8 U:G, U:G, U:G /branch
+Branch8 U:G:32, U:G:8, U:G:8 /branch
     x86: RelCond, Addr, Imm
     x86: RelCond, Index, Imm
 
-Branch32 U:G, U:G, U:G /branch
+Branch32 U:G:32, U:G:32, U:G:32 /branch
     x86: RelCond, Addr, Imm
     RelCond, Tmp, Tmp
     RelCond, Tmp, Imm
@@ -456,17 +459,17 @@
     x86: RelCond, Addr, Tmp
     x86: RelCond, Index, Imm
 
-64: Branch64 U:G, U:G, U:G /branch
+64: Branch64 U:G:32, U:G:64, U:G:64 /branch
     RelCond, Tmp, Tmp
     x86: RelCond, Tmp, Addr
     x86: RelCond, Addr, Tmp
     x86: RelCond, Index, Tmp
 
-BranchTest8 U:G, U:G, U:G /branch
+BranchTest8 U:G:32, U:G:8, U:G:8 /branch
     x86: ResCond, Addr, Imm
     x86: ResCond, Index, Imm
 
-BranchTest32 U:G, U:G, U:G /branch
+BranchTest32 U:G:32, U:G:32, U:G:32 /branch
     ResCond, Tmp, Tmp
     ResCond, Tmp, Imm
     x86: ResCond, Addr, Imm
@@ -474,95 +477,95 @@
 
 # Warning: forms that take an immediate will sign-extend their immediate. You probably want
 # BranchTest32 in most cases where you use an immediate.
-64: BranchTest64 U:G, U:G, U:G /branch
+64: BranchTest64 U:G:32, U:G:64, U:G:64 /branch
     ResCond, Tmp, Tmp
     ResCond, Tmp, Imm
     x86: ResCond, Addr, Imm
     x86: ResCond, Addr, Tmp
     x86: ResCond, Index, Imm
 
-BranchDouble U:G, U:F, U:F /branch
+BranchDouble U:G:32, U:F:64, U:F:64 /branch
     DoubleCond, Tmp, Tmp
 
-BranchFloat U:G, U:F, U:F /branch
+BranchFloat U:G:32, U:F:32, U:F:32 /branch
     DoubleCond, Tmp, Tmp
 
-BranchAdd32 U:G, U:G, UD:G /branch
+BranchAdd32 U:G:32, U:G:32, UZD:G:32 /branch
     ResCond, Tmp, Tmp
     ResCond, Imm, Tmp
     x86: ResCond, Imm, Addr
     x86: ResCond, Tmp, Addr
     x86: ResCond, Addr, Tmp
 
-64: BranchAdd64 U:G, U:G, UD:G /branch
+64: BranchAdd64 U:G:32, U:G:64, UD:G:64 /branch
     ResCond, Imm, Tmp
     ResCond, Tmp, Tmp
 
-BranchMul32 U:G, U:G, UD:G /branch
+BranchMul32 U:G:32, U:G:32, UZD:G:32 /branch
     ResCond, Tmp, Tmp
     x86: ResCond, Addr, Tmp
 
-BranchMul32 U:G, U:G, U:G, D:G /branch
+BranchMul32 U:G:32, U:G:32, U:G:32, ZD:G:32 /branch
     ResCond, Tmp, Imm, Tmp
 
-64: BranchMul64 U:G, U:G, UD:G /branch
+64: BranchMul64 U:G:32, U:G:64, UZD:G:64 /branch
     ResCond, Tmp, Tmp
 
-BranchSub32 U:G, U:G, UD:G /branch
+BranchSub32 U:G:32, U:G:32, UZD:G:32 /branch
     ResCond, Tmp, Tmp
     ResCond, Imm, Tmp
     x86: ResCond, Imm, Addr
     x86: ResCond, Tmp, Addr
     x86: ResCond, Addr, Tmp
 
-64: BranchSub64 U:G, U:G, UD:G /branch
+64: BranchSub64 U:G:32, U:G:64, UD:G:64 /branch
     ResCond, Imm, Tmp
     ResCond, Tmp, Tmp
 
-BranchNeg32 U:G, UD:G /branch
+BranchNeg32 U:G:32, UZD:G:32 /branch
     ResCond, Tmp
 
-64: BranchNeg64 U:G, UD:G /branch
+64: BranchNeg64 U:G:32, UZD:G:64 /branch
     ResCond, Tmp
 
-MoveConditionally32 U:G, U:G, U:G, U:G, UD:G
+MoveConditionally32 U:G:32, U:G:32, U:G:32, U:G:Ptr, UD:G:Ptr
     RelCond, Tmp, Tmp, Tmp, Tmp
 
-64: MoveConditionally64 U:G, U:G, U:G, U:G, UD:G
+64: MoveConditionally64 U:G:32, U:G:64, U:G:64, U:G:Ptr, UD:G:Ptr
     RelCond, Tmp, Tmp, Tmp, Tmp
 
-MoveConditionallyTest32 U:G, U:G, U:G, U:G, UD:G
+MoveConditionallyTest32 U:G:32, U:G:32, U:G:32, U:G:Ptr, UD:G:Ptr
     ResCond, Tmp, Tmp, Tmp, Tmp
     ResCond, Tmp, Imm, Tmp, Tmp
 
-64: MoveConditionallyTest64 U:G, U:G, U:G, U:G, UD:G
+64: MoveConditionallyTest64 U:G:32, U:G:64, U:G:64, U:G:Ptr, UD:G:Ptr
     ResCond, Tmp, Tmp, Tmp, Tmp
     ResCond, Tmp, Imm, Tmp, Tmp
 
-MoveConditionallyDouble U:G, U:F, U:F, U:G, UD:G
+MoveConditionallyDouble U:G:32, U:F:64, U:F:64, U:G:Ptr, UD:G:Ptr
     DoubleCond, Tmp, Tmp, Tmp, Tmp
 
-MoveConditionallyFloat U:G, U:F, U:F, U:G, UD:G
+MoveConditionallyFloat U:G:32, U:F:32, U:F:32, U:G:Ptr, UD:G:Ptr
     DoubleCond, Tmp, Tmp, Tmp, Tmp
 
-MoveDoubleConditionally32 U:G, U:G, U:G, U:F, UD:F
+MoveDoubleConditionally32 U:G:32, U:G:32, U:G:32, U:F:64, UD:F:64
     RelCond, Tmp, Tmp, Tmp, Tmp
 
-64: MoveDoubleConditionally64 U:G, U:G, U:G, U:F, UD:F
+64: MoveDoubleConditionally64 U:G:32, U:G:64, U:G:64, U:F:64, UD:F:64
     RelCond, Tmp, Tmp, Tmp, Tmp
 
-MoveDoubleConditionallyTest32 U:G, U:G, U:G, U:F, UD:F
+MoveDoubleConditionallyTest32 U:G:32, U:G:32, U:G:32, U:F:64, UD:F:64
     ResCond, Tmp, Tmp, Tmp, Tmp
     ResCond, Tmp, Imm, Tmp, Tmp
 
-64: MoveDoubleConditionallyTest64 U:G, U:G, U:G, U:F, UD:F
+64: MoveDoubleConditionallyTest64 U:G:32, U:G:64, U:G:64, U:F:64, UD:F:64
     ResCond, Tmp, Tmp, Tmp, Tmp
     ResCond, Tmp, Imm, Tmp, Tmp
 
-MoveDoubleConditionallyDouble U:G, U:F, U:F, U:F, UD:F
+MoveDoubleConditionallyDouble U:G:32, U:F:64, U:F:64, U:F:64, UD:F:64
     DoubleCond, Tmp, Tmp, Tmp, Tmp
 
-MoveDoubleConditionallyFloat U:G, U:F, U:F, U:F, UD:F
+MoveDoubleConditionallyFloat U:G:32, U:F:32, U:F:32, U:F:64, UD:F:64
     DoubleCond, Tmp, Tmp, Tmp, Tmp
 
 Jump /branch

Modified: trunk/Source/_javascript_Core/b3/air/AirOptimizeBlockOrder.cpp (194330 => 194331)


--- trunk/Source/_javascript_Core/b3/air/AirOptimizeBlockOrder.cpp	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirOptimizeBlockOrder.cpp	2015-12-21 16:16:01 UTC (rev 194331)
@@ -148,6 +148,7 @@
         case BranchTest8:
         case BranchTest32:
         case BranchTest64:
+        case BranchFloat:
         case BranchDouble:
         case BranchAdd32:
         case BranchAdd64:

Modified: trunk/Source/_javascript_Core/b3/air/AirSpillEverything.cpp (194330 => 194331)


--- trunk/Source/_javascript_Core/b3/air/AirSpillEverything.cpp	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirSpillEverything.cpp	2015-12-21 16:16:01 UTC (rev 194331)
@@ -67,7 +67,7 @@
             // code is suboptimal.
             inst.forEachTmpWithExtraClobberedRegs(
                 index < block->size() ? &block->at(index) : nullptr,
-                [&registerSet] (const Tmp& tmp, Arg::Role role, Arg::Type) {
+                [&registerSet] (const Tmp& tmp, Arg::Role role, Arg::Type, Arg::Width) {
                     if (tmp.isReg() && Arg::isDef(role))
                         registerSet.set(tmp.reg());
                 });
@@ -119,7 +119,7 @@
 
             // Now fall back on spilling using separate Move's to load/store the tmp.
             inst.forEachTmp(
-                [&] (Tmp& tmp, Arg::Role role, Arg::Type type) {
+                [&] (Tmp& tmp, Arg::Role role, Arg::Type type, Arg::Width) {
                     if (tmp.isReg())
                         return;
                     
@@ -140,6 +140,7 @@
                         }
                         break;
                     case Arg::Def:
+                    case Arg::ZDef:
                         for (Reg reg : regsInPriorityOrder(type)) {
                             if (!setAfter.get(reg)) {
                                 setAfter.set(reg);
@@ -149,6 +150,7 @@
                         }
                         break;
                     case Arg::UseDef:
+                    case Arg::UseZDef:
                     case Arg::LateUse:
                         for (Reg reg : regsInPriorityOrder(type)) {
                             if (!setBefore.get(reg) && !setAfter.get(reg)) {

Added: trunk/Source/_javascript_Core/b3/air/AirTmpWidth.cpp (0 => 194331)


--- trunk/Source/_javascript_Core/b3/air/AirTmpWidth.cpp	                        (rev 0)
+++ trunk/Source/_javascript_Core/b3/air/AirTmpWidth.cpp	2015-12-21 16:16:01 UTC (rev 194331)
@@ -0,0 +1,144 @@
+/*
+ * Copyright (C) 2015 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "config.h"
+#include "AirTmpWidth.h"
+
+#if ENABLE(B3_JIT)
+
+#include "AirCode.h"
+#include "AirInstInlines.h"
+
+namespace JSC { namespace B3 { namespace Air {
+
+TmpWidth::TmpWidth()
+{
+}
+
+TmpWidth::TmpWidth(Code& code)
+{
+    recompute(code);
+}
+
+TmpWidth::~TmpWidth()
+{
+}
+
+void TmpWidth::recompute(Code& code)
+{
+    m_width.clear();
+    
+    // Assume the worst for registers.
+    RegisterSet::allRegisters().forEach(
+        [&] (Reg reg) {
+            Widths& widths = m_width.add(Tmp(reg), Widths()).iterator->value;
+            Arg::Type type = Arg(Tmp(reg)).type();
+            widths.use = Arg::conservativeWidth(type);
+            widths.def = Arg::conservativeWidth(type);
+        });
+    
+    // Now really analyze everything but Move's over Tmp's, but set aside those Move's so we can find
+    // them quickly during the fixpoint below. Note that we can make this analysis stronger by
+    // recognizing more kinds of Move's or anything that has Move-like behavior, though it's probably not
+    // worth it.
+    Vector<Inst*> moves;
+    for (BasicBlock* block : code) {
+        for (Inst& inst : *block) {
+            if (inst.opcode == Move && inst.args[1].isTmp()) {
+                if (inst.args[0].isTmp()) {
+                    moves.append(&inst);
+                    continue;
+                }
+                if (inst.args[0].isImm()
+                    && inst.args[0].value() >= 0) {
+                    Tmp tmp = inst.args[1].tmp();
+                    Widths& widths = m_width.add(tmp, Widths(Arg::GP)).iterator->value;
+                    
+                    if (inst.args[0].value() <= std::numeric_limits<int8_t>::max())
+                        widths.def = std::max(widths.def, Arg::Width8);
+                    else if (inst.args[0].value() <= std::numeric_limits<int16_t>::max())
+                        widths.def = std::max(widths.def, Arg::Width16);
+                    else if (inst.args[0].value() <= std::numeric_limits<int32_t>::max())
+                        widths.def = std::max(widths.def, Arg::Width32);
+                    else
+                        widths.def = std::max(widths.def, Arg::Width64);
+
+                    continue;
+                }
+            }
+            inst.forEachTmp(
+                [&] (Tmp& tmp, Arg::Role role, Arg::Type type, Arg::Width width) {
+                    Widths& widths = m_width.add(tmp, Widths(type)).iterator->value;
+                    
+                    if (Arg::isAnyUse(role))
+                        widths.use = std::max(widths.use, width);
+
+                    if (Arg::isZDef(role))
+                        widths.def = std::max(widths.def, width);
+                    else if (Arg::isDef(role))
+                        widths.def = Arg::conservativeWidth(type);
+                });
+        }
+    }
+
+    // Finally, fixpoint over the Move's.
+    bool changed = true;
+    while (changed) {
+        changed = false;
+        for (Inst* move : moves) {
+            ASSERT(move->opcode == Move);
+            ASSERT(move->args[0].isTmp());
+            ASSERT(move->args[1].isTmp());
+            
+            Widths& srcWidths = m_width.add(move->args[0].tmp(), Widths(Arg::GP)).iterator->value;
+            Widths& dstWidths = m_width.add(move->args[1].tmp(), Widths(Arg::GP)).iterator->value;
+
+            // Legend:
+            //
+            //     Move %src, %dst
+
+            // defWidth(%dst) is a promise about how many high bits are zero. The smaller the width, the
+            // stronger the promise. This Move may weaken that promise if we know that %src is making a
+            // weaker promise. Such forward flow is the only thing that determines defWidth().
+            if (dstWidths.def < srcWidths.def) {
+                dstWidths.def = srcWidths.def;
+                changed = true;
+            }
+
+            // srcWidth(%src) is a promise about how many high bits are ignored. The smaller the width,
+            // the stronger the promise. This Move may weaken that promise if we know that %dst is making
+            // a weaker promise. Such backward flow is the only thing that determines srcWidth().
+            if (srcWidths.use < dstWidths.use) {
+                srcWidths.use = dstWidths.use;
+                changed = true;
+            }
+        }
+    }
+}
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+

Added: trunk/Source/_javascript_Core/b3/air/AirTmpWidth.h (0 => 194331)


--- trunk/Source/_javascript_Core/b3/air/AirTmpWidth.h	                        (rev 0)
+++ trunk/Source/_javascript_Core/b3/air/AirTmpWidth.h	2015-12-21 16:16:01 UTC (rev 194331)
@@ -0,0 +1,108 @@
+/*
+ * Copyright (C) 2015 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef AirTmpWidth_h
+#define AirTmpWidth_h
+
+#if ENABLE(B3_JIT)
+
+#include "AirArg.h"
+#include <wtf/HashSet.h>
+
+namespace JSC { namespace B3 { namespace Air {
+
+class Code;
+
+class TmpWidth {
+public:
+    TmpWidth();
+    TmpWidth(Code&);
+    ~TmpWidth();
+
+    void recompute(Code&);
+
+    // The width of a Tmp is the number of bits that you need to be able to track without some trivial
+    // recovery. A Tmp may have a "subwidth" (say, Width32 on a 64-bit system) if either of the following
+    // is true:
+    //
+    // - The high bits are never read.
+    // - The high bits are always zero.
+    //
+    // This doesn't tell you which of those properties holds, but you can query that using the other
+    // methods.
+    Arg::Width width(Tmp tmp) const
+    {
+        auto iter = m_width.find(tmp);
+        if (iter == m_width.end())
+            return Arg::minimumWidth(Arg(tmp).type());
+        return std::min(iter->value.use, iter->value.def);
+    }
+
+    // This indirectly tells you how much of the tmp's high bits are guaranteed to be zero. The number of
+    // high bits that are zero are:
+    //
+    //     TotalBits - defWidth(tmp)
+    //
+    // Where TotalBits are the total number of bits in the register, so 64 on a 64-bit system.
+    Arg::Width defWidth(Tmp tmp) const
+    {
+        auto iter = m_width.find(tmp);
+        if (iter == m_width.end())
+            return Arg::minimumWidth(Arg(tmp).type());
+        return iter->value.def;
+    }
+
+    // This tells you how much of Tmp is going to be read.
+    Arg::Width useWidth(Tmp tmp) const
+    {
+        auto iter = m_width.find(tmp);
+        if (iter == m_width.end())
+            return Arg::minimumWidth(Arg(tmp).type());
+        return iter->value.use;
+    }
+    
+private:
+    struct Widths {
+        Widths() { }
+
+        Widths(Arg::Type type)
+        {
+            use = Arg::minimumWidth(type);
+            def = Arg::minimumWidth(type);
+        }
+        
+        Arg::Width use;
+        Arg::Width def;
+    };
+    
+    HashMap<Tmp, Widths> m_width;
+};
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
+#endif // AirTmpWidth_h
+

Modified: trunk/Source/_javascript_Core/b3/air/AirUseCounts.h (194330 => 194331)


--- trunk/Source/_javascript_Core/b3/air/AirUseCounts.h	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirUseCounts.h	2015-12-21 16:16:01 UTC (rev 194331)
@@ -77,7 +77,7 @@
                 frequency *= Options::rareBlockPenalty();
             for (Inst& inst : *block) {
                 inst.forEach<Thing>(
-                    [&] (Thing& arg, Arg::Role role, Arg::Type) {
+                    [&] (Thing& arg, Arg::Role role, Arg::Type, Arg::Width) {
                         Counts& counts = m_counts.add(arg, Counts()).iterator->value;
 
                         if (Arg::isWarmUse(role))

Modified: trunk/Source/_javascript_Core/b3/air/opcode_generator.rb (194330 => 194331)


--- trunk/Source/_javascript_Core/b3/air/opcode_generator.rb	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/opcode_generator.rb	2015-12-21 16:16:01 UTC (rev 194331)
@@ -44,11 +44,12 @@
 end
 
 class Arg
-    attr_reader :role, :type
+    attr_reader :role, :type, :width
 
-    def initialize(role, type)
+    def initialize(role, type, width)
         @role = role
         @type = type
+        @width = width
     end
 end
 
@@ -173,7 +174,7 @@
 end
 
 def isUD(token)
-    token =~ /\A((U)|(D)|(UD)|(UA))\Z/
+    token =~ /\A((U)|(D)|(UD)|(ZD)|(UZD)|(UA))\Z/
 end
 
 def isGF(token)
@@ -188,8 +189,12 @@
     token =~ /\A((x86)|(x86_32)|(x86_64)|(arm)|(armv7)|(arm64)|(32)|(64))\Z/
 end
 
+def isWidth(token)
+    token =~ /\A((8)|(16)|(32)|(64)|(Ptr))\Z/
+end
+
 def isKeyword(token)
-    isUD(token) or isGF(token) or isKind(token) or isArch(token) or
+    isUD(token) or isGF(token) or isKind(token) or isArch(token) or isWidth(token) or
         token == "special" or token == "as"
 end
 
@@ -256,6 +261,13 @@
         result
     end
 
+    def consumeWidth
+        result = token.string
+        parseError("Expected width (8, 16, 32, or 64)") unless isWidth(result)
+        advance
+        result
+    end
+
     def parseArchs
         return nil unless isArch(token)
 
@@ -350,8 +362,10 @@
                         role = consumeRole
                         consume(":")
                         type = consumeType
+                        consume(":")
+                        width = consumeWidth
                         
-                        signature << Arg.new(role, type)
+                        signature << Arg.new(role, type, width)
                         
                         break unless token == ","
                         consume(",")
@@ -606,26 +620,37 @@
     matchInstOverload(outp, :fast, "this") {
         | opcode, overload |
         if opcode.special
-            outp.puts "functor(args[0], Arg::Use, Arg::GP); // This is basically bogus, but it works f analyses model Special as an immediate."
+            outp.puts "functor(args[0], Arg::Use, Arg::GP, Arg::pointerWidth()); // This is basically bogus, but it works for analyses that model Special as an immediate."
             outp.puts "args[0].special()->forEachArg(*this, scopedLambda<EachArgCallback>(functor));"
         else
             overload.signature.each_with_index {
                 | arg, index |
+                
                 role = nil
                 case arg.role
                 when "U"
                     role = "Use"
                 when "D"
                     role = "Def"
+                when "ZD"
+                    role = "ZDef"
                 when "UD"
                     role = "UseDef"
+                when "UZD"
+                    role = "UseZDef"
                 when "UA"
                     role = "UseAddr"
                 else
                     raise
                 end
+
+                if arg.width == "Ptr"
+                    width = "Arg::pointerWidth()"
+                else
+                    width = "Arg::Width#{arg.width}"
+                end
                 
-                outp.puts "functor(args[#{index}], Arg::#{role}, Arg::#{arg.type}P);"
+                outp.puts "functor(args[#{index}], Arg::#{role}, Arg::#{arg.type}P, #{width});"
             }
         end
     }

Modified: trunk/Source/_javascript_Core/b3/testb3.cpp (194330 => 194331)


--- trunk/Source/_javascript_Core/b3/testb3.cpp	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/testb3.cpp	2015-12-21 16:16:01 UTC (rev 194331)
@@ -6368,6 +6368,61 @@
     CHECK(invoke<int>(*code, &value - 2, 1) == 42);
 }
 
+void testCheckTrickyMegaCombo()
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    Value* base = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0);
+    Value* index = root->appendNew<Value>(
+        proc, ZExt32, Origin(),
+        root->appendNew<Value>(
+            proc, Add, Origin(),
+            root->appendNew<Value>(
+                proc, Trunc, Origin(),
+                root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR1)),
+            root->appendNew<Const32Value>(proc, Origin(), 1)));
+
+    Value* ptr = root->appendNew<Value>(
+        proc, Add, Origin(), base,
+        root->appendNew<Value>(
+            proc, Shl, Origin(), index,
+            root->appendNew<Const32Value>(proc, Origin(), 1)));
+    
+    CheckValue* check = root->appendNew<CheckValue>(
+        proc, Check, Origin(),
+        root->appendNew<Value>(
+            proc, LessThan, Origin(),
+            root->appendNew<MemoryValue>(proc, Load8S, Origin(), ptr),
+            root->appendNew<Const32Value>(proc, Origin(), 42)));
+    check->setGenerator(
+        [&] (CCallHelpers& jit, const StackmapGenerationParams& params) {
+            AllowMacroScratchRegisterUsage allowScratch(jit);
+            CHECK(!params.size());
+
+            // This should always work because a function this simple should never have callee
+            // saves.
+            jit.move(CCallHelpers::TrustedImm32(42), GPRInfo::returnValueGPR);
+            jit.emitFunctionEpilogue();
+            jit.ret();
+        });
+    root->appendNew<ControlValue>(
+        proc, Return, Origin(), root->appendNew<Const32Value>(proc, Origin(), 0));
+
+    auto code = compile(proc);
+
+    int8_t value;
+    value = 42;
+    CHECK(invoke<int>(*code, &value - 2, 0) == 0);
+    value = 127;
+    CHECK(invoke<int>(*code, &value - 2, 0) == 0);
+    value = 41;
+    CHECK(invoke<int>(*code, &value - 2, 0) == 42);
+    value = 0;
+    CHECK(invoke<int>(*code, &value - 2, 0) == 42);
+    value = -1;
+    CHECK(invoke<int>(*code, &value - 2, 0) == 42);
+}
+
 void testCheckTwoMegaCombos()
 {
     Procedure proc;
@@ -9474,6 +9529,7 @@
     RUN(testSimpleCheck());
     RUN(testCheckLessThan());
     RUN(testCheckMegaCombo());
+    RUN(testCheckTrickyMegaCombo());
     RUN(testCheckTwoMegaCombos());
     RUN(testCheckTwoNonRedundantMegaCombos());
     RUN(testCheckAddImm());

Modified: trunk/Source/_javascript_Core/ftl/FTLLowerDFGToLLVM.cpp (194330 => 194331)


--- trunk/Source/_javascript_Core/ftl/FTLLowerDFGToLLVM.cpp	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/ftl/FTLLowerDFGToLLVM.cpp	2015-12-21 16:16:01 UTC (rev 194331)
@@ -4218,10 +4218,17 @@
         LValue length = m_out.load32(kids[0], m_heaps.JSString_length);
         for (unsigned i = 1; i < numKids; ++i) {
             flags = m_out.bitAnd(flags, m_out.load32(kids[i], m_heaps.JSString_flags));
+#if FTL_USES_B3
+            B3::CheckValue* lengthCheck = m_out.speculateAdd(
+                length, m_out.load32(kids[i], m_heaps.JSString_length));
+            blessSpeculation(lengthCheck, Uncountable, noValue(), nullptr, m_origin);
+            length = lengthCheck;
+#else // FTL_USES_B3            
             LValue lengthAndOverflow = m_out.addWithOverflow32(
                 length, m_out.load32(kids[i], m_heaps.JSString_length));
             speculate(Uncountable, noValue(), 0, m_out.extractValue(lengthAndOverflow, 1));
             length = m_out.extractValue(lengthAndOverflow, 0);
+#endif // FTL_USES_B3
         }
         m_out.store32(
             m_out.bitAnd(m_out.constInt32(JSString::Is8Bit), flags),

Modified: trunk/Source/_javascript_Core/ftl/FTLOSRExitHandle.cpp (194330 => 194331)


--- trunk/Source/_javascript_Core/ftl/FTLOSRExitHandle.cpp	2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/ftl/FTLOSRExitHandle.cpp	2015-12-21 16:16:01 UTC (rev 194331)
@@ -39,9 +39,10 @@
     label = jit.label();
     jit.pushToSaveImmediateWithoutTouchingRegisters(CCallHelpers::TrustedImm32(index));
     CCallHelpers::PatchableJump jump = jit.patchableJump();
+    RefPtr<OSRExitHandle> self = this;
     jit.addLinkTask(
-        [this, jump] (LinkBuffer& linkBuffer) {
-            exit.m_patchableJump = CodeLocationJump(linkBuffer.locationOf(jump));
+        [self, jump] (LinkBuffer& linkBuffer) {
+            self->exit.m_patchableJump = CodeLocationJump(linkBuffer.locationOf(jump));
 
             linkBuffer.link(
                 jump.m_jump,

_______________________________________________
webkit-changes mailing list
[email protected]
https://lists.webkit.org/mailman/listinfo/webkit-changes

[webkit-changes] [194331] trunk/Source/JavaScriptCore

Log Message

Modified Paths

Added Paths

Diff

Modified: trunk/Source/_javascript_Core/CMakeLists.txt (194330 => 194331)

Modified: trunk/Source/_javascript_Core/ChangeLog (194330 => 194331)

Modified: trunk/Source/_javascript_Core/_javascript_Core.xcodeproj/project.pbxproj (194330 => 194331)

Modified: trunk/Source/_javascript_Core/assembler/AbstractMacroAssembler.h (194330 => 194331)

Modified: trunk/Source/_javascript_Core/b3/B3LowerToAir.cpp (194330 => 194331)

Modified: trunk/Source/_javascript_Core/b3/B3PatchpointSpecial.cpp (194330 => 194331)

Modified: trunk/Source/_javascript_Core/b3/B3StackmapSpecial.cpp (194330 => 194331)

Modified: trunk/Source/_javascript_Core/b3/B3Value.h (194330 => 194331)

Modified: trunk/Source/_javascript_Core/b3/air/AirAllocateStack.cpp (194330 => 194331)

Modified: trunk/Source/_javascript_Core/b3/air/AirArg.cpp (194330 => 194331)

Modified: trunk/Source/_javascript_Core/b3/air/AirArg.h (194330 => 194331)

Modified: trunk/Source/_javascript_Core/b3/air/AirCCallSpecial.cpp (194330 => 194331)

Modified: trunk/Source/_javascript_Core/b3/air/AirEliminateDeadCode.cpp (194330 => 194331)

Modified: trunk/Source/_javascript_Core/b3/air/AirFixPartialRegisterStalls.cpp (194330 => 194331)

Modified: trunk/Source/_javascript_Core/b3/air/AirInst.cpp (194330 => 194331)

Modified: trunk/Source/_javascript_Core/b3/air/AirInst.h (194330 => 194331)

Modified: trunk/Source/_javascript_Core/b3/air/AirInstInlines.h (194330 => 194331)

Modified: trunk/Source/_javascript_Core/b3/air/AirIteratedRegisterCoalescing.cpp (194330 => 194331)

Modified: trunk/Source/_javascript_Core/b3/air/AirLiveness.h (194330 => 194331)

Modified: trunk/Source/_javascript_Core/b3/air/AirOpcode.opcodes (194330 => 194331)

Modified: trunk/Source/_javascript_Core/b3/air/AirOptimizeBlockOrder.cpp (194330 => 194331)

Modified: trunk/Source/_javascript_Core/b3/air/AirSpillEverything.cpp (194330 => 194331)

Added: trunk/Source/_javascript_Core/b3/air/AirTmpWidth.cpp (0 => 194331)

Added: trunk/Source/_javascript_Core/b3/air/AirTmpWidth.h (0 => 194331)

Modified: trunk/Source/_javascript_Core/b3/air/AirUseCounts.h (194330 => 194331)

Modified: trunk/Source/_javascript_Core/b3/air/opcode_generator.rb (194330 => 194331)

Modified: trunk/Source/_javascript_Core/b3/testb3.cpp (194330 => 194331)

Modified: trunk/Source/_javascript_Core/ftl/FTLLowerDFGToLLVM.cpp (194330 => 194331)

Modified: trunk/Source/_javascript_Core/ftl/FTLOSRExitHandle.cpp (194330 => 194331)

Reply via email to