Diff
Modified: trunk/Source/_javascript_Core/CMakeLists.txt (194330 => 194331)
--- trunk/Source/_javascript_Core/CMakeLists.txt 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/CMakeLists.txt 2015-12-21 16:16:01 UTC (rev 194331)
@@ -90,6 +90,7 @@
b3/air/AirSpillEverything.cpp
b3/air/AirStackSlot.cpp
b3/air/AirTmp.cpp
+ b3/air/AirTmpWidth.cpp
b3/air/AirValidate.cpp
b3/B3ArgumentRegValue.cpp
Modified: trunk/Source/_javascript_Core/ChangeLog (194330 => 194331)
--- trunk/Source/_javascript_Core/ChangeLog 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/ChangeLog 2015-12-21 16:16:01 UTC (rev 194331)
@@ -1,3 +1,123 @@
+2015-12-21 Filip Pizlo <[email protected]>
+
+ B3->Air lowering incorrectly copy-propagates over ZExt32's
+ https://bugs.webkit.org/show_bug.cgi?id=152365
+
+ Reviewed by Benjamin Poulain.
+
+ The instruction selector thinks that Value's that return Int32's are going to always be lowered
+ to instructions that zero-extend the destination. But this isn't actually true. If you have an
+ Add32 with a destination on the stack (i.e. spilled) then it only writes 4 bytes. Then, the
+ filler will load 8 bytes from the stack at the point of use. So, the use of the Add32 will see
+ garbage in the high bits.
+
+ The fact that the spiller chose to use 8 bytes for a Tmp that gets defined by an Add32 is a
+ pretty sad bug, but:
+
+ - It's entirely up to the spiller to decide how many bytes to use for a Tmp, since we do not
+ ascribe a type to Tmps. We could ascribe types to Tmps, but then coalescing would become
+ harder. Our goal is to fix the bug while still enabling coalescing in cases like "a[i]" where
+ "i" is a 32-bit integer that is computed using operations that already do zero-extension.
+
+ - More broadly, it's strange that the instruction selector decides whether a Value will be
+ lowered to something that zero-extends. That's too constraining, since the most optimal
+ instruction selection might involve something that doesn't zero-extend in cases of spilling, so
+ the zero-extension should only happen if it's actually needed. This means that we need to
+ understand which Air instructions cause zero-extensions.
+
+ - If we know which Air instructions cause zero-extensions, then we don't need the instruction
+ selector to copy-propagate ZExt32's. We have copy-propagation in Air thanks to the register
+ allocator.
+
+ In fact, the register allocator is exactly where all of the pieces come together. It's there that
+ we want to know which operations zero-extend and which don't. It also wants to know how many bits
+ of a Tmp each instruction reads. Armed with that information, the register allocator can emit
+ more optimal spill code, use less stack space for spill slots, and coalesce Move32's. As a bonus,
+ on X86, it replaces Move's with Move32's whenever it can. On X86, Move32 is cheaper.
+
+ This fixes a crash bug in V8/encrypt. After fixing this, I only needed two minor fixes to get
+ V8/encrypt to run. We're about 10% behind LLVM on steady state throughput on this test. It
+ appears to be mostly due to excessive spilling caused by CCall slow paths. That's fixable: we
+ could make CCalls on slow paths use a variant of CCallSpecial that promises not to clobber any
+ registers, and then have it emit spill code around the call itself. LLVM probably gets this
+ optimization from its live range splitting.
+
+ I tried writing a regression test. The problem is that you need garbage on the stack for this to
+ work, and I didn't feel like writing a flaky test. It appears that running V8/encrypt will cover
+ this, so we do have coverage.
+
+ * CMakeLists.txt:
+ * _javascript_Core.xcodeproj/project.pbxproj:
+ * assembler/AbstractMacroAssembler.h:
+ (JSC::isX86):
+ (JSC::isX86_64):
+ (JSC::optimizeForARMv7IDIVSupported):
+ (JSC::optimizeForX86):
+ (JSC::optimizeForX86_64):
+ * b3/B3LowerToAir.cpp:
+ (JSC::B3::Air::LowerToAir::highBitsAreZero):
+ (JSC::B3::Air::LowerToAir::shouldCopyPropagate):
+ (JSC::B3::Air::LowerToAir::lower):
+ * b3/B3PatchpointSpecial.cpp:
+ (JSC::B3::PatchpointSpecial::forEachArg):
+ * b3/B3StackmapSpecial.cpp:
+ (JSC::B3::StackmapSpecial::forEachArgImpl):
+ * b3/B3Value.h:
+ * b3/air/AirAllocateStack.cpp:
+ (JSC::B3::Air::allocateStack):
+ * b3/air/AirArg.cpp:
+ (WTF::printInternal):
+ * b3/air/AirArg.h:
+ (JSC::B3::Air::Arg::pointerWidth):
+ (JSC::B3::Air::Arg::isAnyUse):
+ (JSC::B3::Air::Arg::isColdUse):
+ (JSC::B3::Air::Arg::isEarlyUse):
+ (JSC::B3::Air::Arg::isDef):
+ (JSC::B3::Air::Arg::isZDef):
+ (JSC::B3::Air::Arg::widthForB3Type):
+ (JSC::B3::Air::Arg::conservativeWidth):
+ (JSC::B3::Air::Arg::minimumWidth):
+ (JSC::B3::Air::Arg::bytes):
+ (JSC::B3::Air::Arg::widthForBytes):
+ (JSC::B3::Air::Arg::Arg):
+ (JSC::B3::Air::Arg::forEachTmp):
+ * b3/air/AirCCallSpecial.cpp:
+ (JSC::B3::Air::CCallSpecial::forEachArg):
+ * b3/air/AirEliminateDeadCode.cpp:
+ (JSC::B3::Air::eliminateDeadCode):
+ * b3/air/AirFixPartialRegisterStalls.cpp:
+ (JSC::B3::Air::fixPartialRegisterStalls):
+ * b3/air/AirInst.cpp:
+ (JSC::B3::Air::Inst::hasArgEffects):
+ * b3/air/AirInst.h:
+ (JSC::B3::Air::Inst::forEachTmpFast):
+ (JSC::B3::Air::Inst::forEachTmp):
+ * b3/air/AirInstInlines.h:
+ (JSC::B3::Air::Inst::forEachTmpWithExtraClobberedRegs):
+ * b3/air/AirIteratedRegisterCoalescing.cpp:
+ * b3/air/AirLiveness.h:
+ (JSC::B3::Air::AbstractLiveness::AbstractLiveness):
+ (JSC::B3::Air::AbstractLiveness::LocalCalc::execute):
+ * b3/air/AirOpcode.opcodes:
+ * b3/air/AirSpillEverything.cpp:
+ (JSC::B3::Air::spillEverything):
+ * b3/air/AirTmpWidth.cpp: Added.
+ (JSC::B3::Air::TmpWidth::TmpWidth):
+ (JSC::B3::Air::TmpWidth::~TmpWidth):
+ * b3/air/AirTmpWidth.h: Added.
+ (JSC::B3::Air::TmpWidth::width):
+ (JSC::B3::Air::TmpWidth::defWidth):
+ (JSC::B3::Air::TmpWidth::useWidth):
+ (JSC::B3::Air::TmpWidth::Widths::Widths):
+ * b3/air/AirUseCounts.h:
+ (JSC::B3::Air::UseCounts::UseCounts):
+ * b3/air/opcode_generator.rb:
+ * b3/testb3.cpp:
+ (JSC::B3::testCheckMegaCombo):
+ (JSC::B3::testCheckTrickyMegaCombo):
+ (JSC::B3::testCheckTwoMegaCombos):
+ (JSC::B3::run):
+
2015-12-21 Andy VanWagoner <[email protected]>
[INTL] Implement String.prototype.localeCompare in ECMA-402
Modified: trunk/Source/_javascript_Core/_javascript_Core.xcodeproj/project.pbxproj (194330 => 194331)
--- trunk/Source/_javascript_Core/_javascript_Core.xcodeproj/project.pbxproj 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/_javascript_Core.xcodeproj/project.pbxproj 2015-12-21 16:16:01 UTC (rev 194331)
@@ -695,6 +695,8 @@
0FE0502C1AA9095600D33B33 /* VarOffset.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FE050231AA9095600D33B33 /* VarOffset.cpp */; };
0FE0502D1AA9095600D33B33 /* VarOffset.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FE050241AA9095600D33B33 /* VarOffset.h */; settings = {ATTRIBUTES = (Private, ); }; };
0FE0502F1AAA806900D33B33 /* ScopedArgumentsTable.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FE0502E1AAA806900D33B33 /* ScopedArgumentsTable.cpp */; };
+ 0FE0E4AD1C24C94A002E17B6 /* AirTmpWidth.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FE0E4AB1C24C94A002E17B6 /* AirTmpWidth.cpp */; };
+ 0FE0E4AE1C24C94A002E17B6 /* AirTmpWidth.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FE0E4AC1C24C94A002E17B6 /* AirTmpWidth.h */; };
0FE228ED1436AB2700196C48 /* Options.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FE228EB1436AB2300196C48 /* Options.h */; settings = {ATTRIBUTES = (Private, ); }; };
0FE228EE1436AB2C00196C48 /* Options.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FE228EA1436AB2300196C48 /* Options.cpp */; };
0FE254F61ABDDD2200A7C6D2 /* DFGVarargsForwardingPhase.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FE254F41ABDDD2200A7C6D2 /* DFGVarargsForwardingPhase.cpp */; };
@@ -2835,6 +2837,8 @@
0FE050231AA9095600D33B33 /* VarOffset.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = VarOffset.cpp; sourceTree = "<group>"; };
0FE050241AA9095600D33B33 /* VarOffset.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = VarOffset.h; sourceTree = "<group>"; };
0FE0502E1AAA806900D33B33 /* ScopedArgumentsTable.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = ScopedArgumentsTable.cpp; sourceTree = "<group>"; };
+ 0FE0E4AB1C24C94A002E17B6 /* AirTmpWidth.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = AirTmpWidth.cpp; path = b3/air/AirTmpWidth.cpp; sourceTree = "<group>"; };
+ 0FE0E4AC1C24C94A002E17B6 /* AirTmpWidth.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = AirTmpWidth.h; path = b3/air/AirTmpWidth.h; sourceTree = "<group>"; };
0FE228EA1436AB2300196C48 /* Options.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = Options.cpp; sourceTree = "<group>"; };
0FE228EB1436AB2300196C48 /* Options.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = Options.h; sourceTree = "<group>"; };
0FE254F41ABDDD2200A7C6D2 /* DFGVarargsForwardingPhase.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = DFGVarargsForwardingPhase.cpp; path = dfg/DFGVarargsForwardingPhase.cpp; sourceTree = "<group>"; };
@@ -4800,6 +4804,8 @@
0FEC85681BDACDC70080FF74 /* AirTmp.cpp */,
0FEC85691BDACDC70080FF74 /* AirTmp.h */,
0FEC856A1BDACDC70080FF74 /* AirTmpInlines.h */,
+ 0FE0E4AB1C24C94A002E17B6 /* AirTmpWidth.cpp */,
+ 0FE0E4AC1C24C94A002E17B6 /* AirTmpWidth.h */,
0F3730921C0D67EE00052BFA /* AirUseCounts.h */,
0FEC856B1BDACDC70080FF74 /* AirValidate.cpp */,
0FEC856C1BDACDC70080FF74 /* AirValidate.h */,
@@ -7296,6 +7302,7 @@
0F235BD917178E1C00690C7F /* FTLExitThunkGenerator.h in Headers */,
0F2B9CF719D0BAC100B1D1B5 /* FTLExitTimeObjectMaterialization.h in Headers */,
0F235BDB17178E1C00690C7F /* FTLExitValue.h in Headers */,
+ 0FE0E4AE1C24C94A002E17B6 /* AirTmpWidth.h in Headers */,
A7F2996C17A0BB670010417A /* FTLFail.h in Headers */,
0FEA0A2C170B661900BB722C /* FTLFormattedValue.h in Headers */,
0FD8A31A17D51F2200CA2C40 /* FTLForOSREntryJITCode.h in Headers */,
@@ -9229,6 +9236,7 @@
0F766D3815AE4A1C008F363E /* StructureStubClearingWatchpoint.cpp in Sources */,
BCCF0D0C0EF0B8A500413C8F /* StructureStubInfo.cpp in Sources */,
705B41AB1A6E501E00716757 /* Symbol.cpp in Sources */,
+ 0FE0E4AD1C24C94A002E17B6 /* AirTmpWidth.cpp in Sources */,
705B41AD1A6E501E00716757 /* SymbolConstructor.cpp in Sources */,
705B41AF1A6E501E00716757 /* SymbolObject.cpp in Sources */,
705B41B11A6E501E00716757 /* SymbolPrototype.cpp in Sources */,
Modified: trunk/Source/_javascript_Core/assembler/AbstractMacroAssembler.h (194330 => 194331)
--- trunk/Source/_javascript_Core/assembler/AbstractMacroAssembler.h 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/assembler/AbstractMacroAssembler.h 2015-12-21 16:16:01 UTC (rev 194331)
@@ -67,6 +67,15 @@
#endif
}
+inline bool isX86_64()
+{
+#if CPU(X86_64)
+ return true;
+#else
+ return false;
+#endif
+}
+
inline bool optimizeForARMv7IDIVSupported()
{
return isARMv7IDIVSupported() && Options::useArchitectureSpecificOptimizations();
@@ -82,6 +91,11 @@
return isX86() && Options::useArchitectureSpecificOptimizations();
}
+inline bool optimizeForX86_64()
+{
+ return isX86_64() && Options::useArchitectureSpecificOptimizations();
+}
+
class AllowMacroScratchRegisterUsage;
class DisallowMacroScratchRegisterUsage;
class LinkBuffer;
Modified: trunk/Source/_javascript_Core/b3/B3LowerToAir.cpp (194330 => 194331)
--- trunk/Source/_javascript_Core/b3/B3LowerToAir.cpp 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/B3LowerToAir.cpp 2015-12-21 16:16:01 UTC (rev 194331)
@@ -158,16 +158,12 @@
}
}
- // NOTE: This entire mechanism could be done over Air, if we felt that this would be fast enough.
- // For now we're assuming that it's faster to do this here, since analyzing B3 is so cheap.
bool shouldCopyPropagate(Value* value)
{
switch (value->opcode()) {
case Trunc:
case Identity:
return true;
- case ZExt32:
- return highBitsAreZero(value->child(0));
default:
return false;
}
@@ -1775,11 +1771,6 @@
}
case ZExt32: {
- if (highBitsAreZero(m_value->child(0))) {
- ASSERT(tmp(m_value->child(0)) == tmp(m_value));
- return;
- }
-
appendUnOp<Move32, Air::Oops>(m_value->child(0));
return;
}
Modified: trunk/Source/_javascript_Core/b3/B3PatchpointSpecial.cpp (194330 => 194331)
--- trunk/Source/_javascript_Core/b3/B3PatchpointSpecial.cpp 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/B3PatchpointSpecial.cpp 2015-12-21 16:16:01 UTC (rev 194331)
@@ -54,7 +54,7 @@
return;
}
- callback(inst.args[1], Arg::Def, inst.origin->airType());
+ callback(inst.args[1], Arg::Def, inst.origin->airType(), inst.origin->airWidth());
forEachArgImpl(0, 2, inst, SameAsRep, callback);
}
Modified: trunk/Source/_javascript_Core/b3/B3StackmapSpecial.cpp (194330 => 194331)
--- trunk/Source/_javascript_Core/b3/B3StackmapSpecial.cpp 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/B3StackmapSpecial.cpp 2015-12-21 16:16:01 UTC (rev 194331)
@@ -111,8 +111,9 @@
role = Arg::LateUse;
break;
}
-
- callback(arg, role, Arg::typeForB3Type(child.value()->type()));
+
+ Type type = child.value()->type();
+ callback(arg, role, Arg::typeForB3Type(type), Arg::widthForB3Type(type));
}
}
Modified: trunk/Source/_javascript_Core/b3/B3Value.h (194330 => 194331)
--- trunk/Source/_javascript_Core/b3/B3Value.h 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/B3Value.h 2015-12-21 16:16:01 UTC (rev 194331)
@@ -76,6 +76,7 @@
// This is useful when lowering. Note that this is only valid for non-void values.
Air::Arg::Type airType() const { return Air::Arg::typeForB3Type(type()); }
+ Air::Arg::Width airWidth() const { return Air::Arg::widthForB3Type(type()); }
AdjacencyList& children() { return m_children; }
const AdjacencyList& children() const { return m_children; }
Modified: trunk/Source/_javascript_Core/b3/air/AirAllocateStack.cpp (194330 => 194331)
--- trunk/Source/_javascript_Core/b3/air/AirAllocateStack.cpp 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirAllocateStack.cpp 2015-12-21 16:16:01 UTC (rev 194331)
@@ -104,7 +104,7 @@
for (BasicBlock* block : code) {
for (Inst& inst : *block) {
inst.forEachArg(
- [&] (Arg& arg, Arg::Role role, Arg::Type) {
+ [&] (Arg& arg, Arg::Role role, Arg::Type, Arg::Width) {
if (role == Arg::UseAddr && arg.isStack())
escapingStackSlots.add(arg.stackSlot());
});
@@ -148,7 +148,7 @@
dataLog("Interfering: ", WTF::pointerListDump(localCalc.live()), "\n");
inst.forEachArg(
- [&] (Arg& arg, Arg::Role role, Arg::Type) {
+ [&] (Arg& arg, Arg::Role role, Arg::Type, Arg::Width) {
if (!Arg::isDef(role))
return;
if (!arg.isStack())
Modified: trunk/Source/_javascript_Core/b3/air/AirArg.cpp (194330 => 194331)
--- trunk/Source/_javascript_Core/b3/air/AirArg.cpp 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirArg.cpp 2015-12-21 16:16:01 UTC (rev 194331)
@@ -178,6 +178,12 @@
case Arg::UseDef:
out.print("UseDef");
return;
+ case Arg::ZDef:
+ out.print("ZDef");
+ return;
+ case Arg::UseZDef:
+ out.print("UseZDef");
+ return;
case Arg::UseAddr:
out.print("UseAddr");
return;
Modified: trunk/Source/_javascript_Core/b3/air/AirArg.h (194330 => 194331)
--- trunk/Source/_javascript_Core/b3/air/AirArg.h 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirArg.h 2015-12-21 16:16:01 UTC (rev 194331)
@@ -101,9 +101,28 @@
// Like Use of address, Def of address does not mean escape.
Def,
+ // This is a special variant of Def that implies that the upper bits of the target register are
+ // zero-filled. Specifically, if the Width of a ZDef is less than the largest possible width of
+ // the argument (for example, we're on a 64-bit machine and we have a Width32 ZDef of a GPR) then
+ // this has different implications for the upper bits (i.e. the top 32 bits in our example)
+ // depending on the kind of the argument:
+ //
+ // For register: the upper bits are zero-filled.
+ // For address: the upper bits are not touched (i.e. we do a 32-bit store in our example).
+ // For tmp: either the upper bits are not touched or they are zero-filled, and we won't know
+ // which until we lower the tmp to either a StackSlot or a Reg.
+ //
+ // The behavior of ZDef is consistent with what happens when you perform 32-bit operations on a
+ // 64-bit GPR. It's not consistent with what happens with 8-bit or 16-bit Defs on x86 GPRs, or
+ // what happens with float Defs in ARM NEON or X86 SSE. Hence why we have both Def and ZDef.
+ ZDef,
+
// This is a combined Use and Def. It means that both things happen.
UseDef,
+ // This is a combined Use and ZDef. It means that both things happen.
+ UseZDef,
+
// This is a special kind of use that is only valid for addresses. It means that the
// instruction will evaluate the address _expression_ and consume the effective address, but it
// will neither load nor store. This is an escaping use, because now the address may be
@@ -126,6 +145,13 @@
Width64
};
+ static Width pointerWidth()
+ {
+ if (sizeof(void*) == 8)
+ return Width64;
+ return Width32;
+ }
+
enum Signedness : int8_t {
Signed,
Unsigned
@@ -139,9 +165,11 @@
case Use:
case ColdUse:
case UseDef:
+ case UseZDef:
case LateUse:
return true;
case Def:
+ case ZDef:
case UseAddr:
return false;
}
@@ -155,7 +183,9 @@
return true;
case Use:
case UseDef:
+ case UseZDef:
case Def:
+ case ZDef:
case UseAddr:
return false;
}
@@ -173,8 +203,10 @@
case Use:
case ColdUse:
case UseDef:
+ case UseZDef:
return true;
case Def:
+ case ZDef:
case UseAddr:
case LateUse:
return false;
@@ -198,10 +230,29 @@
return false;
case Def:
case UseDef:
+ case ZDef:
+ case UseZDef:
return true;
}
}
+ // Returns true if the Role implies that the Inst will ZDef the Arg.
+ static bool isZDef(Role role)
+ {
+ switch (role) {
+ case Use:
+ case ColdUse:
+ case UseAddr:
+ case LateUse:
+ case Def:
+ case UseDef:
+ return false;
+ case ZDef:
+ case UseZDef:
+ return true;
+ }
+ }
+
static Type typeForB3Type(B3::Type type)
{
switch (type) {
@@ -234,6 +285,37 @@
}
}
+ static Width conservativeWidth(Type type)
+ {
+ return type == GP ? pointerWidth() : Width64;
+ }
+
+ static Width minimumWidth(Type type)
+ {
+ return type == GP ? Width8 : Width32;
+ }
+
+ static unsigned bytes(Width width)
+ {
+ return 1 << width;
+ }
+
+ static Width widthForBytes(unsigned bytes)
+ {
+ switch (bytes) {
+ case 0:
+ case 1:
+ return Width8;
+ case 2:
+ return Width16;
+ case 3:
+ case 4:
+ return Width32;
+ default:
+ return Width64;
+ }
+ }
+
Arg()
: m_kind(Invalid)
{
@@ -717,19 +799,19 @@
//
// This defs (%rcx) but uses %rcx.
template<typename Functor>
- void forEachTmp(Role argRole, Type argType, const Functor& functor)
+ void forEachTmp(Role argRole, Type argType, Width argWidth, const Functor& functor)
{
switch (m_kind) {
case Tmp:
ASSERT(isAnyUse(argRole) || isDef(argRole));
- functor(m_base, argRole, argType);
+ functor(m_base, argRole, argType, argWidth);
break;
case Addr:
- functor(m_base, Use, GP);
+ functor(m_base, Use, GP, argRole == UseAddr ? argWidth : pointerWidth());
break;
case Index:
- functor(m_base, Use, GP);
- functor(m_index, Use, GP);
+ functor(m_base, Use, GP, argRole == UseAddr ? argWidth : pointerWidth());
+ functor(m_index, Use, GP, argRole == UseAddr ? argWidth : pointerWidth());
break;
default:
break;
Modified: trunk/Source/_javascript_Core/b3/air/AirCCallSpecial.cpp (194330 => 194331)
--- trunk/Source/_javascript_Core/b3/air/AirCCallSpecial.cpp 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirCCallSpecial.cpp 2015-12-21 16:16:01 UTC (rev 194331)
@@ -45,16 +45,17 @@
void CCallSpecial::forEachArg(Inst& inst, const ScopedLambda<Inst::EachArgCallback>& callback)
{
for (unsigned i = 0; i < numCalleeArgs; ++i)
- callback(inst.args[calleeArgOffset + i], Arg::Use, Arg::GP);
+ callback(inst.args[calleeArgOffset + i], Arg::Use, Arg::GP, Arg::pointerWidth());
for (unsigned i = 0; i < numReturnGPArgs; ++i)
- callback(inst.args[returnGPArgOffset + i], Arg::Def, Arg::GP);
+ callback(inst.args[returnGPArgOffset + i], Arg::Def, Arg::GP, Arg::pointerWidth());
for (unsigned i = 0; i < numReturnFPArgs; ++i)
- callback(inst.args[returnFPArgOffset + i], Arg::Def, Arg::FP);
+ callback(inst.args[returnFPArgOffset + i], Arg::Def, Arg::FP, Arg::Width64);
for (unsigned i = argArgOffset; i < inst.args.size(); ++i) {
// For the type, we can just query the arg's type. The arg will have a type, because we
// require these args to be argument registers.
- callback(inst.args[i], Arg::Use, inst.args[i].type());
+ Arg::Type type = inst.args[i].type();
+ callback(inst.args[i], Arg::Use, type, Arg::conservativeWidth(type));
}
}
Modified: trunk/Source/_javascript_Core/b3/air/AirEliminateDeadCode.cpp (194330 => 194331)
--- trunk/Source/_javascript_Core/b3/air/AirEliminateDeadCode.cpp 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirEliminateDeadCode.cpp 2015-12-21 16:16:01 UTC (rev 194331)
@@ -80,7 +80,7 @@
// This instruction should be presumed dead, if its Args are all dead.
bool storesToLive = false;
inst.forEachArg(
- [&] (Arg& arg, Arg::Role role, Arg::Type) {
+ [&] (Arg& arg, Arg::Role role, Arg::Type, Arg::Width) {
if (!Arg::isDef(role))
return;
storesToLive |= isArgLive(arg);
Modified: trunk/Source/_javascript_Core/b3/air/AirFixPartialRegisterStalls.cpp (194330 => 194331)
--- trunk/Source/_javascript_Core/b3/air/AirFixPartialRegisterStalls.cpp 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirFixPartialRegisterStalls.cpp 2015-12-21 16:16:01 UTC (rev 194331)
@@ -113,7 +113,7 @@
return;
}
- inst.forEachTmp([&] (Tmp& tmp, Arg::Role role, Arg::Type) {
+ inst.forEachTmp([&] (Tmp& tmp, Arg::Role role, Arg::Type, Arg::Width) {
ASSERT_WITH_MESSAGE(tmp.isReg(), "This phase must be run after register allocation.");
if (tmp.isFPR() && Arg::isDef(role))
@@ -203,7 +203,7 @@
if (hasPartialXmmRegUpdate(inst)) {
RegisterSet defs;
RegisterSet uses;
- inst.forEachTmp([&] (Tmp& tmp, Arg::Role role, Arg::Type) {
+ inst.forEachTmp([&] (Tmp& tmp, Arg::Role role, Arg::Type, Arg::Width) {
if (tmp.isFPR()) {
if (Arg::isDef(role))
defs.set(tmp.fpr());
Modified: trunk/Source/_javascript_Core/b3/air/AirInst.cpp (194330 => 194331)
--- trunk/Source/_javascript_Core/b3/air/AirInst.cpp 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirInst.cpp 2015-12-21 16:16:01 UTC (rev 194331)
@@ -38,7 +38,7 @@
{
bool result = false;
forEachArg(
- [&] (Arg&, Arg::Role role, Arg::Type) {
+ [&] (Arg&, Arg::Role role, Arg::Type, Arg::Width) {
if (Arg::isDef(role))
result = true;
});
Modified: trunk/Source/_javascript_Core/b3/air/AirInst.h (194330 => 194331)
--- trunk/Source/_javascript_Core/b3/air/AirInst.h 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirInst.h 2015-12-21 16:16:01 UTC (rev 194331)
@@ -98,20 +98,20 @@
arg.forEachTmpFast(functor);
}
- typedef void EachArgCallback(Arg&, Arg::Role, Arg::Type);
+ typedef void EachArgCallback(Arg&, Arg::Role, Arg::Type, Arg::Width);
- // Calls the functor with (arg, role, type). This function is auto-generated by
+ // Calls the functor with (arg, role, type, width). This function is auto-generated by
// opcode_generator.rb.
template<typename Functor>
void forEachArg(const Functor&);
- // Calls the functor with (tmp, role, type).
+ // Calls the functor with (tmp, role, type, width).
template<typename Functor>
void forEachTmp(const Functor& functor)
{
forEachArg(
- [&] (Arg& arg, Arg::Role role, Arg::Type type) {
- arg.forEachTmp(role, type, functor);
+ [&] (Arg& arg, Arg::Role role, Arg::Type type, Arg::Width width) {
+ arg.forEachTmp(role, type, width, functor);
});
}
Modified: trunk/Source/_javascript_Core/b3/air/AirInstInlines.h (194330 => 194331)
--- trunk/Source/_javascript_Core/b3/air/AirInstInlines.h 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirInstInlines.h 2015-12-21 16:16:01 UTC (rev 194331)
@@ -55,7 +55,7 @@
static void forEach(Inst& inst, const Functor& functor)
{
inst.forEachArg(
- [&] (Arg& arg, Arg::Role role, Arg::Type type) {
+ [&] (Arg& arg, Arg::Role role, Arg::Type type, Arg::Width width) {
if (!arg.isStack())
return;
StackSlot* stackSlot = arg.stackSlot();
@@ -66,7 +66,7 @@
// semantics of "Anonymous".
// https://bugs.webkit.org/show_bug.cgi?id=151128
- functor(stackSlot, role, type);
+ functor(stackSlot, role, type, width);
arg = Arg::stack(stackSlot, arg.offset());
});
}
@@ -99,12 +99,13 @@
inline void Inst::forEachTmpWithExtraClobberedRegs(Inst* nextInst, const Functor& functor)
{
forEachTmp(
- [&] (Tmp& tmpArg, Arg::Role role, Arg::Type argType) {
- functor(tmpArg, role, argType);
+ [&] (Tmp& tmpArg, Arg::Role role, Arg::Type argType, Arg::Width argWidth) {
+ functor(tmpArg, role, argType, argWidth);
});
auto reportReg = [&] (Reg reg) {
- functor(Tmp(reg), Arg::Def, reg.isGPR() ? Arg::GP : Arg::FP);
+ Arg::Type type = reg.isGPR() ? Arg::GP : Arg::FP;
+ functor(Tmp(reg), Arg::Def, type, Arg::conservativeWidth(type));
};
if (hasSpecial())
Modified: trunk/Source/_javascript_Core/b3/air/AirIteratedRegisterCoalescing.cpp (194330 => 194331)
--- trunk/Source/_javascript_Core/b3/air/AirIteratedRegisterCoalescing.cpp 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirIteratedRegisterCoalescing.cpp 2015-12-21 16:16:01 UTC (rev 194331)
@@ -35,6 +35,7 @@
#include "AirPhaseScope.h"
#include "AirRegisterPriority.h"
#include "AirTmpInlines.h"
+#include "AirTmpWidth.h"
#include "AirUseCounts.h"
#include <wtf/ListDump.h>
#include <wtf/ListHashSet.h>
@@ -630,9 +631,10 @@
template<Arg::Type type>
class ColoringAllocator : public AbstractColoringAllocator<unsigned> {
public:
- ColoringAllocator(Code& code, const UseCounts<Tmp>& useCounts, const HashSet<unsigned>& unspillableTmp)
+ ColoringAllocator(Code& code, TmpWidth& tmpWidth, const UseCounts<Tmp>& useCounts, const HashSet<unsigned>& unspillableTmp)
: AbstractColoringAllocator<unsigned>(regsInPriorityOrder(type), AbsoluteTmpMapper<type>::lastMachineRegisterIndex(), tmpArraySize(code))
, m_code(code)
+ , m_tmpWidth(tmpWidth)
, m_useCounts(useCounts)
, m_unspillableTmps(unspillableTmp)
{
@@ -646,9 +648,17 @@
return AbsoluteTmpMapper<type>::tmpFromAbsoluteIndex(getAlias(AbsoluteTmpMapper<type>::absoluteIndex(tmp)));
}
+ // This tells you if a Move will be coalescable if the src and dst end up matching. This method
+ // relies on an analysis that is invalidated by register allocation, so you it's only meaningful to
+ // call this *before* replacing the Tmp's in this Inst with registers or spill slots.
+ bool mayBeCoalescable(const Inst& inst) const
+ {
+ return mayBeCoalescableImpl(inst, &m_tmpWidth);
+ }
+
bool isUselessMove(const Inst& inst) const
{
- return mayBeCoalescable(inst) && inst.args[0].tmp() == inst.args[1].tmp();
+ return mayBeCoalescableImpl(inst, nullptr) && inst.args[0].tmp() == inst.args[1].tmp();
}
Tmp getAliasWhenSpilling(Tmp tmp) const
@@ -770,14 +780,14 @@
{
inst.forEachTmpWithExtraClobberedRegs(
nextInst,
- [&] (const Tmp& arg, Arg::Role role, Arg::Type argType) {
+ [&] (const Tmp& arg, Arg::Role role, Arg::Type argType, Arg::Width) {
if (!Arg::isDef(role) || argType != type)
return;
// All the Def()s interfere with each other and with all the extra clobbered Tmps.
// We should not use forEachDefAndExtraClobberedTmp() here since colored Tmps
// do not need interference edges in our implementation.
- inst.forEachTmp([&] (Tmp& otherArg, Arg::Role role, Arg::Type argType) {
+ inst.forEachTmp([&] (Tmp& otherArg, Arg::Role role, Arg::Type argType, Arg::Width) {
if (!Arg::isDef(role) || argType != type)
return;
@@ -791,7 +801,7 @@
// coalesce the Move even if the two Tmp never interfere anywhere.
Tmp defTmp;
Tmp useTmp;
- inst.forEachTmp([&defTmp, &useTmp] (Tmp& argTmp, Arg::Role role, Arg::Type) {
+ inst.forEachTmp([&defTmp, &useTmp] (Tmp& argTmp, Arg::Role role, Arg::Type, Arg::Width) {
if (Arg::isDef(role))
defTmp = argTmp;
else {
@@ -839,7 +849,7 @@
// All the Def()s interfere with everthing live.
inst.forEachTmpWithExtraClobberedRegs(
nextInst,
- [&] (const Tmp& arg, Arg::Role role, Arg::Type argType) {
+ [&] (const Tmp& arg, Arg::Role role, Arg::Type argType, Arg::Width) {
if (!Arg::isDef(role) || argType != type)
return;
@@ -857,12 +867,15 @@
addEdge(AbsoluteTmpMapper<type>::absoluteIndex(a), AbsoluteTmpMapper<type>::absoluteIndex(b));
}
- bool mayBeCoalescable(const Inst& inst) const
+ // Calling this without a tmpWidth will perform a more conservative coalescing analysis that assumes
+ // that Move32's are not coalescable.
+ static bool mayBeCoalescableImpl(const Inst& inst, TmpWidth* tmpWidth)
{
switch (type) {
case Arg::GP:
switch (inst.opcode) {
case Move:
+ case Move32:
break;
default:
return false;
@@ -887,6 +900,22 @@
ASSERT(inst.args[0].type() == type);
ASSERT(inst.args[1].type() == type);
+ // We can coalesce a Move32 so long as either of the following holds:
+ // - The input is already zero-filled.
+ // - The output only cares about the low 32 bits.
+ //
+ // Note that the input property requires an analysis over ZDef's, so it's only valid so long
+ // as the input gets a register. We don't know if the input gets a register, but we do know
+ // that if it doesn't get a register then we will still emit this Move32.
+ if (inst.opcode == Move32) {
+ if (!tmpWidth)
+ return false;
+
+ if (tmpWidth->defWidth(inst.args[0].tmp()) > Arg::Width32
+ && tmpWidth->useWidth(inst.args[1].tmp()) > Arg::Width32)
+ return false;
+ }
+
return true;
}
@@ -1024,6 +1053,7 @@
using AbstractColoringAllocator<unsigned>::getAlias;
Code& m_code;
+ TmpWidth& m_tmpWidth;
// FIXME: spilling should not type specific. It is only a side effect of using UseCounts.
const UseCounts<Tmp>& m_useCounts;
const HashSet<unsigned>& m_unspillableTmps;
@@ -1053,7 +1083,23 @@
HashSet<unsigned> unspillableTmps;
while (true) {
++m_numIterations;
- ColoringAllocator<type> allocator(m_code, m_useCounts, unspillableTmps);
+
+ // FIXME: One way to optimize this code is to remove the recomputation inside the fixpoint.
+ // We need to recompute because spilling adds tmps, but we could just update tmpWidth when we
+ // add those tmps. Note that one easy way to remove the recomputation is to make any newly
+ // added Tmps get the same use/def widths that the original Tmp got. But, this may hurt the
+ // spill code we emit. Since we currently recompute TmpWidth after spilling, the newly
+ // created Tmps may get narrower use/def widths. On the other hand, the spiller already
+ // selects which move instruction to use based on the original Tmp's widths, so it may not
+ // matter than a subsequent iteration sees a coservative width for the new Tmps. Also, the
+ // recomputation may not actually be a performance problem; it's likely that a better way to
+ // improve performance of TmpWidth is to replace its HashMap with something else. It's
+ // possible that most of the TmpWidth overhead is from queries of TmpWidth rather than the
+ // recomputation, in which case speeding up the lookup would be a bigger win.
+ // https://bugs.webkit.org/show_bug.cgi?id=152478
+ m_tmpWidth.recompute(m_code);
+
+ ColoringAllocator<type> allocator(m_code, m_tmpWidth, m_useCounts, unspillableTmps);
if (!allocator.requiresSpilling()) {
assignRegistersToTmp(allocator);
return;
@@ -1069,6 +1115,22 @@
// Give Tmp a valid register.
for (unsigned instIndex = 0; instIndex < block->size(); ++instIndex) {
Inst& inst = block->at(instIndex);
+
+ // The mayBeCoalescable() method will change its mind for some operations after we
+ // complete register allocation. So, we record this before starting.
+ bool mayBeCoalescable = allocator.mayBeCoalescable(inst);
+
+ // On X86_64, Move32 is cheaper if we know that it's equivalent to a Move. It's
+ // equivalent if the destination's high bits are not observable or if the source's high
+ // bits are all zero. Note that we don't have the opposite optimization for other
+ // architectures, which may prefer Move over Move32, because Move is canonical already.
+ if (type == Arg::GP && optimizeForX86_64() && inst.opcode == Move
+ && inst.args[0].isTmp() && inst.args[1].isTmp()) {
+ if (m_tmpWidth.useWidth(inst.args[1].tmp()) <= Arg::Width32
+ || m_tmpWidth.defWidth(inst.args[0].tmp()) <= Arg::Width32)
+ inst.opcode = Move32;
+ }
+
inst.forEachTmpFast([&] (Tmp& tmp) {
if (tmp.isReg() || tmp.isGP() == (type != Arg::GP))
return;
@@ -1085,11 +1147,15 @@
ASSERT(assignedTmp.isReg());
tmp = assignedTmp;
});
+
+ if (mayBeCoalescable && inst.args[0].isTmp() && inst.args[1].isTmp()
+ && inst.args[0].tmp() == inst.args[1].tmp())
+ inst = Inst();
}
// Remove all the useless moves we created in this block.
block->insts().removeAllMatching([&] (const Inst& inst) {
- return allocator.isUselessMove(inst);
+ return !inst;
});
}
}
@@ -1103,7 +1169,9 @@
unspillableTmps.add(AbsoluteTmpMapper<type>::absoluteIndex(tmp));
// Allocate stack slot for each spilled value.
- bool isNewTmp = stackSlots.add(tmp, m_code.addStackSlot(8, StackSlotKind::Anonymous)).isNewEntry;
+ StackSlot* stackSlot = m_code.addStackSlot(
+ m_tmpWidth.width(tmp) <= Arg::Width32 ? 4 : 8, StackSlotKind::Anonymous);
+ bool isNewTmp = stackSlots.add(tmp, stackSlot).isNewEntry;
ASSERT_UNUSED(isNewTmp, isNewTmp);
}
@@ -1115,18 +1183,36 @@
for (unsigned instIndex = 0; instIndex < block->size(); ++instIndex) {
Inst& inst = block->at(instIndex);
+ // The TmpWidth analysis will say that a Move only stores 32 bits into the destination,
+ // if the source only had 32 bits worth of non-zero bits. Same for the source: it will
+ // only claim to read 32 bits from the source if only 32 bits of the destination are
+ // read. Note that we only apply this logic if this turns into a load or store, since
+ // Move is the canonical way to move data between GPRs.
+ bool forceMove32IfDidSpill = false;
+ bool didSpill = false;
+ if (type == Arg::GP && inst.opcode == Move) {
+ if (m_tmpWidth.defWidth(inst.args[0].tmp()) <= Arg::Width32
+ || m_tmpWidth.useWidth(inst.args[1].tmp()) <= Arg::Width32)
+ forceMove32IfDidSpill = true;
+ }
+
// Try to replace the register use by memory use when possible.
for (unsigned i = 0; i < inst.args.size(); ++i) {
Arg& arg = inst.args[i];
if (arg.isTmp() && arg.type() == type && !arg.isReg()) {
auto stackSlotEntry = stackSlots.find(arg.tmp());
- if (stackSlotEntry != stackSlots.end() && inst.admitsStack(i))
+ if (stackSlotEntry != stackSlots.end() && inst.admitsStack(i)) {
arg = Arg::stack(stackSlotEntry->value);
+ didSpill = true;
+ }
}
}
+ if (didSpill && forceMove32IfDidSpill)
+ inst.opcode = Move32;
+
// For every other case, add Load/Store as needed.
- inst.forEachTmp([&] (Tmp& tmp, Arg::Role role, Arg::Type argType) {
+ inst.forEachTmp([&] (Tmp& tmp, Arg::Role role, Arg::Type argType, Arg::Width) {
if (tmp.isReg() || argType != type)
return;
@@ -1141,7 +1227,18 @@
}
Arg arg = Arg::stack(stackSlotEntry->value);
- Opcode move = type == Arg::GP ? Move : MoveDouble;
+ Opcode move = Oops;
+ switch (stackSlotEntry->value->byteSize()) {
+ case 4:
+ move = type == Arg::GP ? Move32 : MoveFloat;
+ break;
+ case 8:
+ move = type == Arg::GP ? Move : MoveDouble;
+ break;
+ default:
+ RELEASE_ASSERT_NOT_REACHED();
+ break;
+ }
if (Arg::isAnyUse(role)) {
Tmp newTmp = m_code.newTmp(type);
@@ -1166,6 +1263,7 @@
}
Code& m_code;
+ TmpWidth m_tmpWidth;
UseCounts<Tmp> m_useCounts;
unsigned m_numIterations { 0 };
};
Modified: trunk/Source/_javascript_Core/b3/air/AirLiveness.h (194330 => 194331)
--- trunk/Source/_javascript_Core/b3/air/AirLiveness.h 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirLiveness.h 2015-12-21 16:16:01 UTC (rev 194331)
@@ -90,7 +90,7 @@
typename Adapter::IndexSet& liveAtTail = m_liveAtTail[block];
block->last().forEach<typename Adapter::Thing>(
- [&] (typename Adapter::Thing& thing, Arg::Role role, Arg::Type type) {
+ [&] (typename Adapter::Thing& thing, Arg::Role role, Arg::Type type, Arg::Width) {
if (Arg::isLateUse(role) && Adapter::acceptsType(type))
liveAtTail.add(Adapter::valueToIndex(thing));
});
@@ -216,14 +216,14 @@
auto& workset = m_liveness.m_workset;
// First handle def's.
inst.forEach<typename Adapter::Thing>(
- [&] (typename Adapter::Thing& thing, Arg::Role role, Arg::Type type) {
+ [&] (typename Adapter::Thing& thing, Arg::Role role, Arg::Type type, Arg::Width) {
if (Arg::isDef(role) && Adapter::acceptsType(type))
workset.remove(Adapter::valueToIndex(thing));
});
// Then handle use's.
inst.forEach<typename Adapter::Thing>(
- [&] (typename Adapter::Thing& thing, Arg::Role role, Arg::Type type) {
+ [&] (typename Adapter::Thing& thing, Arg::Role role, Arg::Type type, Arg::Width) {
if (Arg::isEarlyUse(role) && Adapter::acceptsType(type))
workset.add(Adapter::valueToIndex(thing));
});
@@ -232,7 +232,7 @@
if (instIndex) {
Inst& prevInst = m_block->at(instIndex - 1);
prevInst.forEach<typename Adapter::Thing>(
- [&] (typename Adapter::Thing& thing, Arg::Role role, Arg::Type type) {
+ [&] (typename Adapter::Thing& thing, Arg::Role role, Arg::Type type, Arg::Width) {
if (Arg::isLateUse(role) && Adapter::acceptsType(type))
workset.add(Adapter::valueToIndex(thing));
});
Modified: trunk/Source/_javascript_Core/b3/air/AirOpcode.opcodes (194330 => 194331)
--- trunk/Source/_javascript_Core/b3/air/AirOpcode.opcodes 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirOpcode.opcodes 2015-12-21 16:16:01 UTC (rev 194331)
@@ -23,14 +23,17 @@
# Syllabus:
#
-# Roles and types:
-# U:G => use of a general-purpose register or value
-# D:G => def of a general-purpose register or value
-# UD:G => use and def of a general-purpose register or value
-# UA:G => UseAddr (see comment in Arg.h)
-# U:F => use of a float register or value
-# D:F => def of a float register or value
-# UD:F => use and def of a float register or value
+# Examples of some roles, types, and widths:
+# U:G:32 => use of the low 32 bits of a general-purpose register or value
+# D:G:32 => def of the low 32 bits of a general-purpose register or value
+# UD:G:32 => use and def of the low 32 bits of a general-purpose register or value
+# U:G:64 => use of the low 64 bits of a general-purpose register or value
+# ZD:G:32 => def of all bits of a general-purpose register, where all but the low 32 bits are guaranteed to be zeroed.
+# UA:G:Ptr => UseAddr (see comment in Arg.h)
+# U:F:32 => use of a float register or value
+# U:F:64 => use of a double register or value
+# D:F:32 => def of a float register or value
+# UD:F:32 => use and def of a float register or value
#
# Argument kinds:
# Tmp => temporary or register
@@ -44,11 +47,11 @@
# of things. So, although this file uses a particular indentation style, none of the whitespace or
# even newlines are meaningful to the parser. For example, you could write:
#
-# Foo42 U:G, UD:F Imm, Tmp Addr, Tmp
+# Foo42 U:G:32, UD:F:32 Imm, Tmp Addr, Tmp
#
# And the parser would know that this is the same as:
#
-# Foo42 U:G, UD:F
+# Foo42 U:G:32, UD:F:32
# Imm, Tmp
# Addr, Tmp
#
@@ -58,22 +61,22 @@
# union of those architectures. For example, if this is the only overload of the opcode, then it makes the
# opcode only available on x86_64:
#
-# x86_64: Fuzz UD:G, D:G
+# x86_64: Fuzz UD:G:64, D:G:64
# Tmp, Tmp
# Tmp, Addr
#
# But this only restricts the two-operand form, the other form is allowed on all architectures:
#
-# x86_64: Fuzz UD:G, D:G
+# x86_64: Fuzz UD:G:64, D:G:64
# Tmp, Tmp
# Tmp, Addr
-# Fuzz UD:G, D:G, U:F
+# Fuzz UD:G:Ptr, D:G:Ptr, U:F:Ptr
# Tmp, Tmp, Tmp
# Tmp, Addr, Tmp
#
# And you can also restrict individual forms:
#
-# Thingy UD:G, D:G
+# Thingy UD:G:32, D:G:32
# Tmp, Tmp
# arm64: Tmp, Addr
#
@@ -81,7 +84,7 @@
# form. In this example, the version that takes an address is only available on armv7 while the other
# versions are available on armv7 or x86_64:
#
-# x86_64 armv7: Buzz U:G, UD:F
+# x86_64 armv7: Buzz U:G:32, UD:F:32
# Tmp, Tmp
# Imm, Tmp
# armv7: Addr, Tmp
@@ -103,214 +106,214 @@
Nop
-Add32 U:G, UD:G
+Add32 U:G:32, UZD:G:32
Tmp, Tmp
x86: Imm, Addr
Imm, Tmp
x86: Addr, Tmp
x86: Tmp, Addr
-Add32 U:G, U:G, D:G
+Add32 U:G:32, U:G:32, ZD:G:32
Imm, Tmp, Tmp
Tmp, Tmp, Tmp
-64: Add64 U:G, UD:G
+64: Add64 U:G:64, UD:G:64
Tmp, Tmp
x86: Imm, Addr
Imm, Tmp
x86: Addr, Tmp
x86: Tmp, Addr
-64: Add64 U:G, U:G, D:G
+64: Add64 U:G:64, U:G:64, D:G:64
Imm, Tmp, Tmp
Tmp, Tmp, Tmp
-AddDouble U:F, UD:F
+AddDouble U:F:64, UD:F:64
Tmp, Tmp
x86: Addr, Tmp
-AddFloat U:F, UD:F
+AddFloat U:F:32, UD:F:32
Tmp, Tmp
x86: Addr, Tmp
-Sub32 U:G, UD:G
+Sub32 U:G:32, UZD:G:32
Tmp, Tmp
x86: Imm, Addr
Imm, Tmp
x86: Addr, Tmp
x86: Tmp, Addr
-64: Sub64 U:G, UD:G
+64: Sub64 U:G:64, UD:G:64
Tmp, Tmp
x86: Imm, Addr
Imm, Tmp
x86: Addr, Tmp
x86: Tmp, Addr
-SubDouble U:F, UD:F
+SubDouble U:F:64, UD:F:64
Tmp, Tmp
x86: Addr, Tmp
-SubFloat U:F, UD:F
+SubFloat U:F:32, UD:F:32
Tmp, Tmp
x86: Addr, Tmp
-Neg32 UD:G
+Neg32 UZD:G:32
Tmp
Addr
-64: Neg64 UD:G
+64: Neg64 UD:G:64
Tmp
-Mul32 U:G, UD:G
+Mul32 U:G:32, UZD:G:32
Tmp, Tmp
x86: Addr, Tmp
-Mul32 U:G, U:G, D:G
+Mul32 U:G:32, U:G:32, ZD:G:32
Imm, Tmp, Tmp
-64: Mul64 U:G, UD:G
+64: Mul64 U:G:64, UD:G:64
Tmp, Tmp
-MulDouble U:F, UD:F
+MulDouble U:F:64, UD:F:64
Tmp, Tmp
x86: Addr, Tmp
-MulFloat U:F, UD:F
+MulFloat U:F:32, UD:F:32
Tmp, Tmp
x86: Addr, Tmp
-DivDouble U:F, UD:F
+DivDouble U:F:32, UD:F:32
Tmp, Tmp
x86: Addr, Tmp
-DivFloat U:F, UD:F
+DivFloat U:F:32, UD:F:32
Tmp, Tmp
x86: Addr, Tmp
-x86: X86ConvertToDoubleWord32 U:G, D:G
+x86: X86ConvertToDoubleWord32 U:G:32, ZD:G:32
Tmp*, Tmp*
-x86_64: X86ConvertToQuadWord64 U:G, D:G
+x86_64: X86ConvertToQuadWord64 U:G:64, D:G:64
Tmp*, Tmp*
-x86: X86Div32 UD:G, UD:G, U:G
+x86: X86Div32 UZD:G:32, UZD:G:32, U:G:32
Tmp*, Tmp*, Tmp
-x86_64: X86Div64 UD:G, UD:G, U:G
+x86_64: X86Div64 UZD:G:64, UZD:G:64, U:G:64
Tmp*, Tmp*, Tmp
-Lea UA:G, D:G
+Lea UA:G:Ptr, D:G:Ptr
Addr, Tmp
-And32 U:G, UD:G
+And32 U:G:32, UZD:G:32
Tmp, Tmp
Imm, Tmp
x86: Tmp, Addr
x86: Addr, Tmp
x86: Imm, Addr
-64: And64 U:G, UD:G
+64: And64 U:G:64, UD:G:64
Tmp, Tmp
Imm, Tmp
-AndDouble U:F, UD:F
+AndDouble U:F:64, UD:F:64
Tmp, Tmp
-AndFloat U:F, UD:F
+AndFloat U:F:32, UD:F:32
Tmp, Tmp
-Lshift32 U:G, UD:G
+Lshift32 U:G:32, UZD:G:32
Tmp*, Tmp
Imm, Tmp
-64: Lshift64 U:G, UD:G
+64: Lshift64 U:G:64, UD:G:64
Tmp*, Tmp
Imm, Tmp
-Rshift32 U:G, UD:G
+Rshift32 U:G:32, UZD:G:32
Tmp*, Tmp
Imm, Tmp
-64: Rshift64 U:G, UD:G
+64: Rshift64 U:G:64, UD:G:64
Tmp*, Tmp
Imm, Tmp
-Urshift32 U:G, UD:G
+Urshift32 U:G:32, UZD:G:32
Tmp*, Tmp
Imm, Tmp
-64: Urshift64 U:G, UD:G
+64: Urshift64 U:G:64, UD:G:64
Tmp*, Tmp
Imm, Tmp
-Or32 U:G, UD:G
+Or32 U:G:32, UZD:G:32
Tmp, Tmp
Imm, Tmp
x86: Tmp, Addr
x86: Addr, Tmp
x86: Imm, Addr
-64: Or64 U:G, UD:G
+64: Or64 U:G:64, UD:G:64
Tmp, Tmp
Imm, Tmp
-Xor32 U:G, UD:G
+Xor32 U:G:32, UZD:G:32
Tmp, Tmp
Imm, Tmp
x86: Tmp, Addr
x86: Addr, Tmp
x86: Imm, Addr
-64: Xor64 U:G, UD:G
+64: Xor64 U:G:64, UD:G:64
Tmp, Tmp
x86: Tmp, Addr
Imm, Tmp
-Not32 UD:G
+Not32 UZD:G:32
Tmp
x86: Addr
-64: Not64 UD:G
+64: Not64 UD:G:64
Tmp
x86: Addr
-CeilDouble U:F, UD:F
+CeilDouble U:F:64, UD:F:64
Tmp, Tmp
Addr, Tmp
-CeilFloat U:F, UD:F
+CeilFloat U:F:32, UD:F:32
Tmp, Tmp
Addr, Tmp
-SqrtDouble U:F, UD:F
+SqrtDouble U:F:64, UD:F:64
Tmp, Tmp
x86: Addr, Tmp
-SqrtFloat U:F, UD:F
+SqrtFloat U:F:32, UD:F:32
Tmp, Tmp
x86: Addr, Tmp
-ConvertInt32ToDouble U:G, D:F
+ConvertInt32ToDouble U:G:32, D:F:64
Tmp, Tmp
x86: Addr, Tmp
-64: ConvertInt64ToDouble U:G, D:F
+64: ConvertInt64ToDouble U:G:64, D:F:64
Tmp, Tmp
-CountLeadingZeros32 U:G, D:G
+CountLeadingZeros32 U:G:32, ZD:G:32
Tmp, Tmp
x86: Addr, Tmp
-64: CountLeadingZeros64 U:G, D:G
+64: CountLeadingZeros64 U:G:64, D:G:64
Tmp, Tmp
x86: Addr, Tmp
-ConvertDoubleToFloat U:F, D:F
+ConvertDoubleToFloat U:F:64, D:F:32
Tmp, Tmp
x86: Addr, Tmp
-ConvertFloatToDouble U:F, D:F
+ConvertFloatToDouble U:F:32, D:F:64
Tmp, Tmp
x86: Addr, Tmp
@@ -318,7 +321,7 @@
# the platform. I'm not entirely sure that this is a good thing; it might be better to just have a
# Move64 instruction. OTOH, our MacroAssemblers already have this notion of "move()" that basically
# means movePtr.
-Move U:G, D:G
+Move U:G:Ptr, D:G:Ptr
Tmp, Tmp
Imm, Tmp as signExtend32ToPtr
Imm64, Tmp
@@ -328,7 +331,7 @@
Tmp, Index as storePtr
Imm, Addr as storePtr
-Move32 U:G, D:G
+Move32 U:G:32, ZD:G:32
Tmp, Tmp as zeroExtend32ToPtr
Addr, Tmp as load32
Index, Tmp as load32
@@ -337,118 +340,118 @@
Imm, Addr as store32
Imm, Index as store32
-SignExtend32ToPtr U:G, D:G
+SignExtend32ToPtr U:G:32, D:G:Ptr
Tmp, Tmp
-ZeroExtend8To32 U:G, D:G
+ZeroExtend8To32 U:G:8, ZD:G:32
Tmp, Tmp
Addr, Tmp as load8
Index, Tmp as load8
-SignExtend8To32 U:G, D:G
+SignExtend8To32 U:G:8, ZD:G:32
Tmp, Tmp
x86: Addr, Tmp as load8SignedExtendTo32
Index, Tmp as load8SignedExtendTo32
-ZeroExtend16To32 U:G, D:G
+ZeroExtend16To32 U:G:16, ZD:G:32
Tmp, Tmp
Addr, Tmp as load16
Index, Tmp as load16
-SignExtend16To32 U:G, D:G
+SignExtend16To32 U:G:16, ZD:G:32
Tmp, Tmp
Addr, Tmp as load16SignedExtendTo32
Index, Tmp as load16SignedExtendTo32
-MoveFloat U:F, D:F
+MoveFloat U:F:32, D:F:32
Tmp, Tmp as moveDouble
Addr, Tmp as loadFloat
Index, Tmp as loadFloat
Tmp, Addr as storeFloat
Tmp, Index as storeFloat
-MoveDouble U:F, D:F
+MoveDouble U:F:64, D:F:64
Tmp, Tmp
Addr, Tmp as loadDouble
Index, Tmp as loadDouble
Tmp, Addr as storeDouble
Tmp, Index as storeDouble
-MoveZeroToDouble D:F
+MoveZeroToDouble D:F:64
Tmp
-64: Move64ToDouble U:G, D:F
+64: Move64ToDouble U:G:64, D:F:64
Tmp, Tmp
Addr, Tmp as loadDouble
Index, Tmp as loadDouble
-MoveInt32ToPacked U:G, D:F
+MoveInt32ToPacked U:G:32, D:F:32
Tmp, Tmp
Addr, Tmp as loadFloat
Index, Tmp as loadFloat
-64: MoveDoubleTo64 U:F, D:G
+64: MoveDoubleTo64 U:F:64, D:G:64
Tmp, Tmp
Addr, Tmp as load64
Index, Tmp as load64
-MovePackedToInt32 U:F, D:G
+MovePackedToInt32 U:F:32, D:G:32
Tmp, Tmp
Addr, Tmp as load32
Index, Tmp as load32
-Load8 U:G, D:G
+Load8 U:G:8, ZD:G:32
Addr, Tmp
Index, Tmp
-Store8 U:G, D:G
+Store8 U:G:8, D:G:8
Tmp, Index
Tmp, Addr
Imm, Index
Imm, Addr
-Load8SignedExtendTo32 U:G, D:G
+Load8SignedExtendTo32 U:G:8, ZD:G:32
Addr, Tmp
Index, Tmp
-Load16 U:G, D:G
+Load16 U:G:16, ZD:G:32
Addr, Tmp
Index, Tmp
-Load16SignedExtendTo32 U:G, D:G
+Load16SignedExtendTo32 U:G:16, ZD:G:32
Addr, Tmp
Index, Tmp
-Compare32 U:G, U:G, U:G, D:G
+Compare32 U:G:32, U:G:32, U:G:32, ZD:G:32
RelCond, Tmp, Tmp, Tmp
RelCond, Tmp, Imm, Tmp
-64: Compare64 U:G, U:G, U:G, D:G
+64: Compare64 U:G:32, U:G:64, U:G:64, ZD:G:32
RelCond, Tmp, Imm, Tmp
RelCond, Tmp, Tmp, Tmp
-Test32 U:G, U:G, U:G, D:G
+Test32 U:G:32, U:G:32, U:G:32, ZD:G:32
x86: ResCond, Addr, Imm, Tmp
ResCond, Tmp, Tmp, Tmp
-64: Test64 U:G, U:G, U:G, D:G
+64: Test64 U:G:32, U:G:64, U:G:64, ZD:G:32
ResCond, Tmp, Imm, Tmp
ResCond, Tmp, Tmp, Tmp
-CompareDouble U:G, U:F, U:F, D:G
+CompareDouble U:G:32, U:F:64, U:F:64, ZD:G:32
DoubleCond, Tmp, Tmp, Tmp
-CompareFloat U:G, U:F, U:F, D:G
+CompareFloat U:G:32, U:F:32, U:F:32, ZD:G:32
DoubleCond, Tmp, Tmp, Tmp
# Note that branches have some logic in AirOptimizeBlockOrder.cpp. If you add new branches, please make sure
# you opt them into the block order optimizations.
-Branch8 U:G, U:G, U:G /branch
+Branch8 U:G:32, U:G:8, U:G:8 /branch
x86: RelCond, Addr, Imm
x86: RelCond, Index, Imm
-Branch32 U:G, U:G, U:G /branch
+Branch32 U:G:32, U:G:32, U:G:32 /branch
x86: RelCond, Addr, Imm
RelCond, Tmp, Tmp
RelCond, Tmp, Imm
@@ -456,17 +459,17 @@
x86: RelCond, Addr, Tmp
x86: RelCond, Index, Imm
-64: Branch64 U:G, U:G, U:G /branch
+64: Branch64 U:G:32, U:G:64, U:G:64 /branch
RelCond, Tmp, Tmp
x86: RelCond, Tmp, Addr
x86: RelCond, Addr, Tmp
x86: RelCond, Index, Tmp
-BranchTest8 U:G, U:G, U:G /branch
+BranchTest8 U:G:32, U:G:8, U:G:8 /branch
x86: ResCond, Addr, Imm
x86: ResCond, Index, Imm
-BranchTest32 U:G, U:G, U:G /branch
+BranchTest32 U:G:32, U:G:32, U:G:32 /branch
ResCond, Tmp, Tmp
ResCond, Tmp, Imm
x86: ResCond, Addr, Imm
@@ -474,95 +477,95 @@
# Warning: forms that take an immediate will sign-extend their immediate. You probably want
# BranchTest32 in most cases where you use an immediate.
-64: BranchTest64 U:G, U:G, U:G /branch
+64: BranchTest64 U:G:32, U:G:64, U:G:64 /branch
ResCond, Tmp, Tmp
ResCond, Tmp, Imm
x86: ResCond, Addr, Imm
x86: ResCond, Addr, Tmp
x86: ResCond, Index, Imm
-BranchDouble U:G, U:F, U:F /branch
+BranchDouble U:G:32, U:F:64, U:F:64 /branch
DoubleCond, Tmp, Tmp
-BranchFloat U:G, U:F, U:F /branch
+BranchFloat U:G:32, U:F:32, U:F:32 /branch
DoubleCond, Tmp, Tmp
-BranchAdd32 U:G, U:G, UD:G /branch
+BranchAdd32 U:G:32, U:G:32, UZD:G:32 /branch
ResCond, Tmp, Tmp
ResCond, Imm, Tmp
x86: ResCond, Imm, Addr
x86: ResCond, Tmp, Addr
x86: ResCond, Addr, Tmp
-64: BranchAdd64 U:G, U:G, UD:G /branch
+64: BranchAdd64 U:G:32, U:G:64, UD:G:64 /branch
ResCond, Imm, Tmp
ResCond, Tmp, Tmp
-BranchMul32 U:G, U:G, UD:G /branch
+BranchMul32 U:G:32, U:G:32, UZD:G:32 /branch
ResCond, Tmp, Tmp
x86: ResCond, Addr, Tmp
-BranchMul32 U:G, U:G, U:G, D:G /branch
+BranchMul32 U:G:32, U:G:32, U:G:32, ZD:G:32 /branch
ResCond, Tmp, Imm, Tmp
-64: BranchMul64 U:G, U:G, UD:G /branch
+64: BranchMul64 U:G:32, U:G:64, UZD:G:64 /branch
ResCond, Tmp, Tmp
-BranchSub32 U:G, U:G, UD:G /branch
+BranchSub32 U:G:32, U:G:32, UZD:G:32 /branch
ResCond, Tmp, Tmp
ResCond, Imm, Tmp
x86: ResCond, Imm, Addr
x86: ResCond, Tmp, Addr
x86: ResCond, Addr, Tmp
-64: BranchSub64 U:G, U:G, UD:G /branch
+64: BranchSub64 U:G:32, U:G:64, UD:G:64 /branch
ResCond, Imm, Tmp
ResCond, Tmp, Tmp
-BranchNeg32 U:G, UD:G /branch
+BranchNeg32 U:G:32, UZD:G:32 /branch
ResCond, Tmp
-64: BranchNeg64 U:G, UD:G /branch
+64: BranchNeg64 U:G:32, UZD:G:64 /branch
ResCond, Tmp
-MoveConditionally32 U:G, U:G, U:G, U:G, UD:G
+MoveConditionally32 U:G:32, U:G:32, U:G:32, U:G:Ptr, UD:G:Ptr
RelCond, Tmp, Tmp, Tmp, Tmp
-64: MoveConditionally64 U:G, U:G, U:G, U:G, UD:G
+64: MoveConditionally64 U:G:32, U:G:64, U:G:64, U:G:Ptr, UD:G:Ptr
RelCond, Tmp, Tmp, Tmp, Tmp
-MoveConditionallyTest32 U:G, U:G, U:G, U:G, UD:G
+MoveConditionallyTest32 U:G:32, U:G:32, U:G:32, U:G:Ptr, UD:G:Ptr
ResCond, Tmp, Tmp, Tmp, Tmp
ResCond, Tmp, Imm, Tmp, Tmp
-64: MoveConditionallyTest64 U:G, U:G, U:G, U:G, UD:G
+64: MoveConditionallyTest64 U:G:32, U:G:64, U:G:64, U:G:Ptr, UD:G:Ptr
ResCond, Tmp, Tmp, Tmp, Tmp
ResCond, Tmp, Imm, Tmp, Tmp
-MoveConditionallyDouble U:G, U:F, U:F, U:G, UD:G
+MoveConditionallyDouble U:G:32, U:F:64, U:F:64, U:G:Ptr, UD:G:Ptr
DoubleCond, Tmp, Tmp, Tmp, Tmp
-MoveConditionallyFloat U:G, U:F, U:F, U:G, UD:G
+MoveConditionallyFloat U:G:32, U:F:32, U:F:32, U:G:Ptr, UD:G:Ptr
DoubleCond, Tmp, Tmp, Tmp, Tmp
-MoveDoubleConditionally32 U:G, U:G, U:G, U:F, UD:F
+MoveDoubleConditionally32 U:G:32, U:G:32, U:G:32, U:F:64, UD:F:64
RelCond, Tmp, Tmp, Tmp, Tmp
-64: MoveDoubleConditionally64 U:G, U:G, U:G, U:F, UD:F
+64: MoveDoubleConditionally64 U:G:32, U:G:64, U:G:64, U:F:64, UD:F:64
RelCond, Tmp, Tmp, Tmp, Tmp
-MoveDoubleConditionallyTest32 U:G, U:G, U:G, U:F, UD:F
+MoveDoubleConditionallyTest32 U:G:32, U:G:32, U:G:32, U:F:64, UD:F:64
ResCond, Tmp, Tmp, Tmp, Tmp
ResCond, Tmp, Imm, Tmp, Tmp
-64: MoveDoubleConditionallyTest64 U:G, U:G, U:G, U:F, UD:F
+64: MoveDoubleConditionallyTest64 U:G:32, U:G:64, U:G:64, U:F:64, UD:F:64
ResCond, Tmp, Tmp, Tmp, Tmp
ResCond, Tmp, Imm, Tmp, Tmp
-MoveDoubleConditionallyDouble U:G, U:F, U:F, U:F, UD:F
+MoveDoubleConditionallyDouble U:G:32, U:F:64, U:F:64, U:F:64, UD:F:64
DoubleCond, Tmp, Tmp, Tmp, Tmp
-MoveDoubleConditionallyFloat U:G, U:F, U:F, U:F, UD:F
+MoveDoubleConditionallyFloat U:G:32, U:F:32, U:F:32, U:F:64, UD:F:64
DoubleCond, Tmp, Tmp, Tmp, Tmp
Jump /branch
Modified: trunk/Source/_javascript_Core/b3/air/AirOptimizeBlockOrder.cpp (194330 => 194331)
--- trunk/Source/_javascript_Core/b3/air/AirOptimizeBlockOrder.cpp 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirOptimizeBlockOrder.cpp 2015-12-21 16:16:01 UTC (rev 194331)
@@ -148,6 +148,7 @@
case BranchTest8:
case BranchTest32:
case BranchTest64:
+ case BranchFloat:
case BranchDouble:
case BranchAdd32:
case BranchAdd64:
Modified: trunk/Source/_javascript_Core/b3/air/AirSpillEverything.cpp (194330 => 194331)
--- trunk/Source/_javascript_Core/b3/air/AirSpillEverything.cpp 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirSpillEverything.cpp 2015-12-21 16:16:01 UTC (rev 194331)
@@ -67,7 +67,7 @@
// code is suboptimal.
inst.forEachTmpWithExtraClobberedRegs(
index < block->size() ? &block->at(index) : nullptr,
- [®isterSet] (const Tmp& tmp, Arg::Role role, Arg::Type) {
+ [®isterSet] (const Tmp& tmp, Arg::Role role, Arg::Type, Arg::Width) {
if (tmp.isReg() && Arg::isDef(role))
registerSet.set(tmp.reg());
});
@@ -119,7 +119,7 @@
// Now fall back on spilling using separate Move's to load/store the tmp.
inst.forEachTmp(
- [&] (Tmp& tmp, Arg::Role role, Arg::Type type) {
+ [&] (Tmp& tmp, Arg::Role role, Arg::Type type, Arg::Width) {
if (tmp.isReg())
return;
@@ -140,6 +140,7 @@
}
break;
case Arg::Def:
+ case Arg::ZDef:
for (Reg reg : regsInPriorityOrder(type)) {
if (!setAfter.get(reg)) {
setAfter.set(reg);
@@ -149,6 +150,7 @@
}
break;
case Arg::UseDef:
+ case Arg::UseZDef:
case Arg::LateUse:
for (Reg reg : regsInPriorityOrder(type)) {
if (!setBefore.get(reg) && !setAfter.get(reg)) {
Added: trunk/Source/_javascript_Core/b3/air/AirTmpWidth.cpp (0 => 194331)
--- trunk/Source/_javascript_Core/b3/air/AirTmpWidth.cpp (rev 0)
+++ trunk/Source/_javascript_Core/b3/air/AirTmpWidth.cpp 2015-12-21 16:16:01 UTC (rev 194331)
@@ -0,0 +1,144 @@
+/*
+ * Copyright (C) 2015 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "config.h"
+#include "AirTmpWidth.h"
+
+#if ENABLE(B3_JIT)
+
+#include "AirCode.h"
+#include "AirInstInlines.h"
+
+namespace JSC { namespace B3 { namespace Air {
+
+TmpWidth::TmpWidth()
+{
+}
+
+TmpWidth::TmpWidth(Code& code)
+{
+ recompute(code);
+}
+
+TmpWidth::~TmpWidth()
+{
+}
+
+void TmpWidth::recompute(Code& code)
+{
+ m_width.clear();
+
+ // Assume the worst for registers.
+ RegisterSet::allRegisters().forEach(
+ [&] (Reg reg) {
+ Widths& widths = m_width.add(Tmp(reg), Widths()).iterator->value;
+ Arg::Type type = Arg(Tmp(reg)).type();
+ widths.use = Arg::conservativeWidth(type);
+ widths.def = Arg::conservativeWidth(type);
+ });
+
+ // Now really analyze everything but Move's over Tmp's, but set aside those Move's so we can find
+ // them quickly during the fixpoint below. Note that we can make this analysis stronger by
+ // recognizing more kinds of Move's or anything that has Move-like behavior, though it's probably not
+ // worth it.
+ Vector<Inst*> moves;
+ for (BasicBlock* block : code) {
+ for (Inst& inst : *block) {
+ if (inst.opcode == Move && inst.args[1].isTmp()) {
+ if (inst.args[0].isTmp()) {
+ moves.append(&inst);
+ continue;
+ }
+ if (inst.args[0].isImm()
+ && inst.args[0].value() >= 0) {
+ Tmp tmp = inst.args[1].tmp();
+ Widths& widths = m_width.add(tmp, Widths(Arg::GP)).iterator->value;
+
+ if (inst.args[0].value() <= std::numeric_limits<int8_t>::max())
+ widths.def = std::max(widths.def, Arg::Width8);
+ else if (inst.args[0].value() <= std::numeric_limits<int16_t>::max())
+ widths.def = std::max(widths.def, Arg::Width16);
+ else if (inst.args[0].value() <= std::numeric_limits<int32_t>::max())
+ widths.def = std::max(widths.def, Arg::Width32);
+ else
+ widths.def = std::max(widths.def, Arg::Width64);
+
+ continue;
+ }
+ }
+ inst.forEachTmp(
+ [&] (Tmp& tmp, Arg::Role role, Arg::Type type, Arg::Width width) {
+ Widths& widths = m_width.add(tmp, Widths(type)).iterator->value;
+
+ if (Arg::isAnyUse(role))
+ widths.use = std::max(widths.use, width);
+
+ if (Arg::isZDef(role))
+ widths.def = std::max(widths.def, width);
+ else if (Arg::isDef(role))
+ widths.def = Arg::conservativeWidth(type);
+ });
+ }
+ }
+
+ // Finally, fixpoint over the Move's.
+ bool changed = true;
+ while (changed) {
+ changed = false;
+ for (Inst* move : moves) {
+ ASSERT(move->opcode == Move);
+ ASSERT(move->args[0].isTmp());
+ ASSERT(move->args[1].isTmp());
+
+ Widths& srcWidths = m_width.add(move->args[0].tmp(), Widths(Arg::GP)).iterator->value;
+ Widths& dstWidths = m_width.add(move->args[1].tmp(), Widths(Arg::GP)).iterator->value;
+
+ // Legend:
+ //
+ // Move %src, %dst
+
+ // defWidth(%dst) is a promise about how many high bits are zero. The smaller the width, the
+ // stronger the promise. This Move may weaken that promise if we know that %src is making a
+ // weaker promise. Such forward flow is the only thing that determines defWidth().
+ if (dstWidths.def < srcWidths.def) {
+ dstWidths.def = srcWidths.def;
+ changed = true;
+ }
+
+ // srcWidth(%src) is a promise about how many high bits are ignored. The smaller the width,
+ // the stronger the promise. This Move may weaken that promise if we know that %dst is making
+ // a weaker promise. Such backward flow is the only thing that determines srcWidth().
+ if (srcWidths.use < dstWidths.use) {
+ srcWidths.use = dstWidths.use;
+ changed = true;
+ }
+ }
+ }
+}
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
Added: trunk/Source/_javascript_Core/b3/air/AirTmpWidth.h (0 => 194331)
--- trunk/Source/_javascript_Core/b3/air/AirTmpWidth.h (rev 0)
+++ trunk/Source/_javascript_Core/b3/air/AirTmpWidth.h 2015-12-21 16:16:01 UTC (rev 194331)
@@ -0,0 +1,108 @@
+/*
+ * Copyright (C) 2015 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef AirTmpWidth_h
+#define AirTmpWidth_h
+
+#if ENABLE(B3_JIT)
+
+#include "AirArg.h"
+#include <wtf/HashSet.h>
+
+namespace JSC { namespace B3 { namespace Air {
+
+class Code;
+
+class TmpWidth {
+public:
+ TmpWidth();
+ TmpWidth(Code&);
+ ~TmpWidth();
+
+ void recompute(Code&);
+
+ // The width of a Tmp is the number of bits that you need to be able to track without some trivial
+ // recovery. A Tmp may have a "subwidth" (say, Width32 on a 64-bit system) if either of the following
+ // is true:
+ //
+ // - The high bits are never read.
+ // - The high bits are always zero.
+ //
+ // This doesn't tell you which of those properties holds, but you can query that using the other
+ // methods.
+ Arg::Width width(Tmp tmp) const
+ {
+ auto iter = m_width.find(tmp);
+ if (iter == m_width.end())
+ return Arg::minimumWidth(Arg(tmp).type());
+ return std::min(iter->value.use, iter->value.def);
+ }
+
+ // This indirectly tells you how much of the tmp's high bits are guaranteed to be zero. The number of
+ // high bits that are zero are:
+ //
+ // TotalBits - defWidth(tmp)
+ //
+ // Where TotalBits are the total number of bits in the register, so 64 on a 64-bit system.
+ Arg::Width defWidth(Tmp tmp) const
+ {
+ auto iter = m_width.find(tmp);
+ if (iter == m_width.end())
+ return Arg::minimumWidth(Arg(tmp).type());
+ return iter->value.def;
+ }
+
+ // This tells you how much of Tmp is going to be read.
+ Arg::Width useWidth(Tmp tmp) const
+ {
+ auto iter = m_width.find(tmp);
+ if (iter == m_width.end())
+ return Arg::minimumWidth(Arg(tmp).type());
+ return iter->value.use;
+ }
+
+private:
+ struct Widths {
+ Widths() { }
+
+ Widths(Arg::Type type)
+ {
+ use = Arg::minimumWidth(type);
+ def = Arg::minimumWidth(type);
+ }
+
+ Arg::Width use;
+ Arg::Width def;
+ };
+
+ HashMap<Tmp, Widths> m_width;
+};
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
+#endif // AirTmpWidth_h
+
Modified: trunk/Source/_javascript_Core/b3/air/AirUseCounts.h (194330 => 194331)
--- trunk/Source/_javascript_Core/b3/air/AirUseCounts.h 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/AirUseCounts.h 2015-12-21 16:16:01 UTC (rev 194331)
@@ -77,7 +77,7 @@
frequency *= Options::rareBlockPenalty();
for (Inst& inst : *block) {
inst.forEach<Thing>(
- [&] (Thing& arg, Arg::Role role, Arg::Type) {
+ [&] (Thing& arg, Arg::Role role, Arg::Type, Arg::Width) {
Counts& counts = m_counts.add(arg, Counts()).iterator->value;
if (Arg::isWarmUse(role))
Modified: trunk/Source/_javascript_Core/b3/air/opcode_generator.rb (194330 => 194331)
--- trunk/Source/_javascript_Core/b3/air/opcode_generator.rb 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/air/opcode_generator.rb 2015-12-21 16:16:01 UTC (rev 194331)
@@ -44,11 +44,12 @@
end
class Arg
- attr_reader :role, :type
+ attr_reader :role, :type, :width
- def initialize(role, type)
+ def initialize(role, type, width)
@role = role
@type = type
+ @width = width
end
end
@@ -173,7 +174,7 @@
end
def isUD(token)
- token =~ /\A((U)|(D)|(UD)|(UA))\Z/
+ token =~ /\A((U)|(D)|(UD)|(ZD)|(UZD)|(UA))\Z/
end
def isGF(token)
@@ -188,8 +189,12 @@
token =~ /\A((x86)|(x86_32)|(x86_64)|(arm)|(armv7)|(arm64)|(32)|(64))\Z/
end
+def isWidth(token)
+ token =~ /\A((8)|(16)|(32)|(64)|(Ptr))\Z/
+end
+
def isKeyword(token)
- isUD(token) or isGF(token) or isKind(token) or isArch(token) or
+ isUD(token) or isGF(token) or isKind(token) or isArch(token) or isWidth(token) or
token == "special" or token == "as"
end
@@ -256,6 +261,13 @@
result
end
+ def consumeWidth
+ result = token.string
+ parseError("Expected width (8, 16, 32, or 64)") unless isWidth(result)
+ advance
+ result
+ end
+
def parseArchs
return nil unless isArch(token)
@@ -350,8 +362,10 @@
role = consumeRole
consume(":")
type = consumeType
+ consume(":")
+ width = consumeWidth
- signature << Arg.new(role, type)
+ signature << Arg.new(role, type, width)
break unless token == ","
consume(",")
@@ -606,26 +620,37 @@
matchInstOverload(outp, :fast, "this") {
| opcode, overload |
if opcode.special
- outp.puts "functor(args[0], Arg::Use, Arg::GP); // This is basically bogus, but it works f analyses model Special as an immediate."
+ outp.puts "functor(args[0], Arg::Use, Arg::GP, Arg::pointerWidth()); // This is basically bogus, but it works for analyses that model Special as an immediate."
outp.puts "args[0].special()->forEachArg(*this, scopedLambda<EachArgCallback>(functor));"
else
overload.signature.each_with_index {
| arg, index |
+
role = nil
case arg.role
when "U"
role = "Use"
when "D"
role = "Def"
+ when "ZD"
+ role = "ZDef"
when "UD"
role = "UseDef"
+ when "UZD"
+ role = "UseZDef"
when "UA"
role = "UseAddr"
else
raise
end
+
+ if arg.width == "Ptr"
+ width = "Arg::pointerWidth()"
+ else
+ width = "Arg::Width#{arg.width}"
+ end
- outp.puts "functor(args[#{index}], Arg::#{role}, Arg::#{arg.type}P);"
+ outp.puts "functor(args[#{index}], Arg::#{role}, Arg::#{arg.type}P, #{width});"
}
end
}
Modified: trunk/Source/_javascript_Core/b3/testb3.cpp (194330 => 194331)
--- trunk/Source/_javascript_Core/b3/testb3.cpp 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/b3/testb3.cpp 2015-12-21 16:16:01 UTC (rev 194331)
@@ -6368,6 +6368,61 @@
CHECK(invoke<int>(*code, &value - 2, 1) == 42);
}
+void testCheckTrickyMegaCombo()
+{
+ Procedure proc;
+ BasicBlock* root = proc.addBlock();
+ Value* base = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0);
+ Value* index = root->appendNew<Value>(
+ proc, ZExt32, Origin(),
+ root->appendNew<Value>(
+ proc, Add, Origin(),
+ root->appendNew<Value>(
+ proc, Trunc, Origin(),
+ root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR1)),
+ root->appendNew<Const32Value>(proc, Origin(), 1)));
+
+ Value* ptr = root->appendNew<Value>(
+ proc, Add, Origin(), base,
+ root->appendNew<Value>(
+ proc, Shl, Origin(), index,
+ root->appendNew<Const32Value>(proc, Origin(), 1)));
+
+ CheckValue* check = root->appendNew<CheckValue>(
+ proc, Check, Origin(),
+ root->appendNew<Value>(
+ proc, LessThan, Origin(),
+ root->appendNew<MemoryValue>(proc, Load8S, Origin(), ptr),
+ root->appendNew<Const32Value>(proc, Origin(), 42)));
+ check->setGenerator(
+ [&] (CCallHelpers& jit, const StackmapGenerationParams& params) {
+ AllowMacroScratchRegisterUsage allowScratch(jit);
+ CHECK(!params.size());
+
+ // This should always work because a function this simple should never have callee
+ // saves.
+ jit.move(CCallHelpers::TrustedImm32(42), GPRInfo::returnValueGPR);
+ jit.emitFunctionEpilogue();
+ jit.ret();
+ });
+ root->appendNew<ControlValue>(
+ proc, Return, Origin(), root->appendNew<Const32Value>(proc, Origin(), 0));
+
+ auto code = compile(proc);
+
+ int8_t value;
+ value = 42;
+ CHECK(invoke<int>(*code, &value - 2, 0) == 0);
+ value = 127;
+ CHECK(invoke<int>(*code, &value - 2, 0) == 0);
+ value = 41;
+ CHECK(invoke<int>(*code, &value - 2, 0) == 42);
+ value = 0;
+ CHECK(invoke<int>(*code, &value - 2, 0) == 42);
+ value = -1;
+ CHECK(invoke<int>(*code, &value - 2, 0) == 42);
+}
+
void testCheckTwoMegaCombos()
{
Procedure proc;
@@ -9474,6 +9529,7 @@
RUN(testSimpleCheck());
RUN(testCheckLessThan());
RUN(testCheckMegaCombo());
+ RUN(testCheckTrickyMegaCombo());
RUN(testCheckTwoMegaCombos());
RUN(testCheckTwoNonRedundantMegaCombos());
RUN(testCheckAddImm());
Modified: trunk/Source/_javascript_Core/ftl/FTLLowerDFGToLLVM.cpp (194330 => 194331)
--- trunk/Source/_javascript_Core/ftl/FTLLowerDFGToLLVM.cpp 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/ftl/FTLLowerDFGToLLVM.cpp 2015-12-21 16:16:01 UTC (rev 194331)
@@ -4218,10 +4218,17 @@
LValue length = m_out.load32(kids[0], m_heaps.JSString_length);
for (unsigned i = 1; i < numKids; ++i) {
flags = m_out.bitAnd(flags, m_out.load32(kids[i], m_heaps.JSString_flags));
+#if FTL_USES_B3
+ B3::CheckValue* lengthCheck = m_out.speculateAdd(
+ length, m_out.load32(kids[i], m_heaps.JSString_length));
+ blessSpeculation(lengthCheck, Uncountable, noValue(), nullptr, m_origin);
+ length = lengthCheck;
+#else // FTL_USES_B3
LValue lengthAndOverflow = m_out.addWithOverflow32(
length, m_out.load32(kids[i], m_heaps.JSString_length));
speculate(Uncountable, noValue(), 0, m_out.extractValue(lengthAndOverflow, 1));
length = m_out.extractValue(lengthAndOverflow, 0);
+#endif // FTL_USES_B3
}
m_out.store32(
m_out.bitAnd(m_out.constInt32(JSString::Is8Bit), flags),
Modified: trunk/Source/_javascript_Core/ftl/FTLOSRExitHandle.cpp (194330 => 194331)
--- trunk/Source/_javascript_Core/ftl/FTLOSRExitHandle.cpp 2015-12-21 15:16:58 UTC (rev 194330)
+++ trunk/Source/_javascript_Core/ftl/FTLOSRExitHandle.cpp 2015-12-21 16:16:01 UTC (rev 194331)
@@ -39,9 +39,10 @@
label = jit.label();
jit.pushToSaveImmediateWithoutTouchingRegisters(CCallHelpers::TrustedImm32(index));
CCallHelpers::PatchableJump jump = jit.patchableJump();
+ RefPtr<OSRExitHandle> self = this;
jit.addLinkTask(
- [this, jump] (LinkBuffer& linkBuffer) {
- exit.m_patchableJump = CodeLocationJump(linkBuffer.locationOf(jump));
+ [self, jump] (LinkBuffer& linkBuffer) {
+ self->exit.m_patchableJump = CodeLocationJump(linkBuffer.locationOf(jump));
linkBuffer.link(
jump.m_jump,