Diff
Modified: trunk/Source/_javascript_Core/ChangeLog (196512 => 196513)
--- trunk/Source/_javascript_Core/ChangeLog 2016-02-12 22:31:08 UTC (rev 196512)
+++ trunk/Source/_javascript_Core/ChangeLog 2016-02-12 22:32:44 UTC (rev 196513)
@@ -1,3 +1,187 @@
+2016-02-12 Benjamin Poulain <[email protected]>
+
+ [JSC] On x86, improve the selection of which value are selected for the UseDef part of commutative operations
+ https://bugs.webkit.org/show_bug.cgi?id=154151
+
+ Reviewed by Filip Pizlo.
+
+ Previously, when an instruction destroy an argument with
+ a UseDef use, we would try to pick a good target for the UseDef
+ while doing instruction selection.
+
+ For example:
+ @x = Add(@1, @2)
+
+ can be lowered to:
+ Move @1 Tmp3
+ Add @2 Tmp3
+ or
+ Move @2 Tmp3
+ Add @1 Tmp3
+
+ The choice of which value ends up copied is done by preferRightForResult()
+ at lowering time.
+
+ There are two common problems with the code we generate:
+ 1) It is based on UseCount. If a value is at its last use,
+ it is a good target for coalescing even with a use-count > 1.
+ 2) When both values are at their last use, the best choice
+ depends on the register pressure of each. We don't have that information
+ until we do register allocation.
+
+ This patch implements a simple idea to minimize how many of those Moves are needed.
+ Each commutative operation gets a 3 op variant. The register allocator then attempts
+ to alias *both* of them to the destination.
+ Since our aliasing is conservative, it removes as many copy as possible without causing
+ spilling.
+
+ There was an unexpected cool impovement too. If you have:
+ Move Tmp1, Tmp2
+ BranchAdd32 Tmp3, Tmp2
+ we would previously restore Tmp2 by substracting Tmp3 from the result.
+ We can now just use Tmp1. That removes quite a few Sub from the slow paths.
+
+ The problem is that simple idea uncoverred a bunch of issues that had to be fixed too.
+ I detail them inline below.
+
+ * assembler/MacroAssemblerARM64.h:
+ (JSC::MacroAssemblerARM64::and64):
+ * assembler/MacroAssemblerX86Common.h:
+ Most addition are adding an Address version of the 3 operands opcodes.
+ The reason for this is allow the complex addressing forms of instructions
+ when spilling.
+
+ (JSC::MacroAssemblerX86Common::and32):
+ (JSC::MacroAssemblerX86Common::mul32):
+ (JSC::MacroAssemblerX86Common::or32):
+ (JSC::MacroAssemblerX86Common::xor32):
+ (JSC::MacroAssemblerX86Common::moveDouble):
+ This was an unexpected discovery: removing tons of Move32 made floating-point heavy
+ code much slower.
+
+ It turns out the MoveDouble we were using has partial register dependencies.
+
+ The x86 optimization manual, Chapter 3, section 3.4.1.13 lists the move instructions executed
+ directly on the frontend. That's what we use now.
+
+ (JSC::MacroAssemblerX86Common::addDouble):
+ (JSC::MacroAssemblerX86Common::addFloat):
+ (JSC::MacroAssemblerX86Common::mulDouble):
+ (JSC::MacroAssemblerX86Common::mulFloat):
+ (JSC::MacroAssemblerX86Common::andDouble):
+ (JSC::MacroAssemblerX86Common::andFloat):
+ (JSC::MacroAssemblerX86Common::xorDouble):
+ (JSC::MacroAssemblerX86Common::xorFloat):
+ If the destination is not aliased, the version taking an address
+ use LoadFloat/LoadDouble instead of direct addressing.
+
+ That is because this:
+ Move Tmp1, Tmp2
+ Op [Tmp3], Tmp2
+ is slower than
+ Move [Tmp3] Tmp2
+ Op Tmp1, Tmp2
+ (sometimes significantly).
+
+ I am not exactly sure why.
+
+ (JSC::MacroAssemblerX86Common::branchAdd32):
+ * assembler/MacroAssemblerX86_64.h:
+ (JSC::MacroAssemblerX86_64::and64):
+ * assembler/MacroAssemblerARM64.h:
+ (JSC::MacroAssemblerARM64::and64):
+ * assembler/MacroAssemblerX86Common.h:
+ (JSC::MacroAssemblerX86Common::and32):
+ (JSC::MacroAssemblerX86Common::mul32):
+ (JSC::MacroAssemblerX86Common::or32):
+ (JSC::MacroAssemblerX86Common::xor32):
+ (JSC::MacroAssemblerX86Common::moveDouble):
+ (JSC::MacroAssemblerX86Common::addDouble):
+ (JSC::MacroAssemblerX86Common::addFloat):
+ (JSC::MacroAssemblerX86Common::mulDouble):
+ (JSC::MacroAssemblerX86Common::mulFloat):
+ (JSC::MacroAssemblerX86Common::andDouble):
+ (JSC::MacroAssemblerX86Common::andFloat):
+ (JSC::MacroAssemblerX86Common::xorDouble):
+ (JSC::MacroAssemblerX86Common::xorFloat):
+ (JSC::MacroAssemblerX86Common::branchAdd32):
+ * assembler/MacroAssemblerX86_64.h:
+ (JSC::MacroAssemblerX86_64::and64):
+ (JSC::MacroAssemblerX86_64::mul64):
+ (JSC::MacroAssemblerX86_64::xor64):
+ (JSC::MacroAssemblerX86_64::branchAdd64):
+ * assembler/X86Assembler.h:
+ (JSC::X86Assembler::movapd_rr):
+ (JSC::X86Assembler::movaps_rr):
+ * b3/B3CheckSpecial.cpp:
+ (JSC::B3::CheckSpecial::shouldTryAliasingDef):
+ (JSC::B3::CheckSpecial::generate):
+ * b3/B3CheckSpecial.h:
+ * b3/B3LowerToAir.cpp:
+ (JSC::B3::Air::LowerToAir::lower):
+ * b3/air/AirCustom.h:
+ (JSC::B3::Air::PatchCustom::shouldTryAliasingDef):
+ * b3/air/AirInst.h:
+ * b3/air/AirInstInlines.h:
+ (JSC::B3::Air::Inst::shouldTryAliasingDef):
+ * b3/air/AirIteratedRegisterCoalescing.cpp:
+ Aliasing the operands is done the same way as any coalescing.
+
+ There were problem with considering all those coalescing
+ as equivalent for the result.
+
+ Moves are mostly generated for Upsilon-Phis. Getting rid of
+ those tends to give better loops.
+
+ Sometimes, blocks have only Phis and a Jump. Coalescing
+ those moves gets rids of the block entirely.
+
+ Where it go interesting was that something like:
+ Move Tmp1, Tmp2
+ Op Tmp3, Tmp2
+ was significantly better than:
+ Op Tmp1, Tmp3
+ Move Tmp1, Tmp4
+ even in the same basic block.
+
+ To get back to the same performance when, I had to prioritize
+ regular Moves operations over argument coalescing.
+
+ Another argument for doing this is that the alias has a shorter
+ life in the hardware because the operation itself gets a new
+ virtual register from the bank.
+
+ * b3/air/AirOpcode.opcodes:
+ * b3/air/AirSpecial.cpp:
+ (JSC::B3::Air::Special::shouldTryAliasingDef):
+ * b3/air/AirSpecial.h:
+ * b3/testb3.cpp:
+ (JSC::B3::testCheckAddArgumentAliasing64):
+ (JSC::B3::testCheckAddArgumentAliasing32):
+ (JSC::B3::testCheckAddSelfOverflow64):
+ (JSC::B3::testCheckAddSelfOverflow32):
+ (JSC::B3::testCheckMulArgumentAliasing64):
+ (JSC::B3::testCheckMulArgumentAliasing32):
+ (JSC::B3::run):
+
+ * dfg/DFGOSRExitCompilerCommon.cpp:
+ (JSC::DFG::reifyInlinedCallFrames):
+ * jit/AssemblyHelpers.h:
+ (JSC::AssemblyHelpers::emitSaveOrCopyCalleeSavesFor):
+ This ruined my week.
+
+ When regenerating the frame of an inlined function that
+ was called through a tail call, we were ignoring r13 for some reason.
+
+ Since this patch makes it more likely to increase the degree
+ of each Tmp, the number of register used increased and r13 was more
+ commonly used.
+
+ When getting out of OSRExit, we would have that value trashed :(
+
+ The fix is simply to restore it like the other two Baseline callee saved
+ register.
+
2016-02-12 Yusuke Suzuki <[email protected]>
[ES6] Implement @@search
Modified: trunk/Source/_javascript_Core/assembler/MacroAssemblerARM64.h (196512 => 196513)
--- trunk/Source/_javascript_Core/assembler/MacroAssemblerARM64.h 2016-02-12 22:31:08 UTC (rev 196512)
+++ trunk/Source/_javascript_Core/assembler/MacroAssemblerARM64.h 2016-02-12 22:32:44 UTC (rev 196513)
@@ -362,6 +362,11 @@
and32(dataTempRegister, dest);
}
+ void and64(RegisterID src1, RegisterID src2, RegisterID dest)
+ {
+ m_assembler.and_<64>(dest, src1, src2);
+ }
+
void and64(RegisterID src, RegisterID dest)
{
m_assembler.and_<64>(dest, dest, src);
Modified: trunk/Source/_javascript_Core/assembler/MacroAssemblerX86Common.h (196512 => 196513)
--- trunk/Source/_javascript_Core/assembler/MacroAssemblerX86Common.h 2016-02-12 22:31:08 UTC (rev 196512)
+++ trunk/Source/_javascript_Core/assembler/MacroAssemblerX86Common.h 2016-02-12 22:32:44 UTC (rev 196513)
@@ -267,6 +267,18 @@
}
}
+ void and32(Address op1, RegisterID op2, RegisterID dest)
+ {
+ move(op2, dest);
+ and32(op1, dest);
+ }
+
+ void and32(RegisterID op1, Address op2, RegisterID dest)
+ {
+ move(op1, dest);
+ and32(op2, dest);
+ }
+
void and32(TrustedImm32 imm, RegisterID src, RegisterID dest)
{
move(src, dest);
@@ -334,10 +346,32 @@
m_assembler.imull_rr(src, dest);
}
+ void mul32(RegisterID src1, RegisterID src2, RegisterID dest)
+ {
+ if (src2 == dest) {
+ m_assembler.imull_rr(src1, dest);
+ return;
+ }
+ move(src1, dest);
+ m_assembler.imull_rr(src2, dest);
+ }
+
void mul32(Address src, RegisterID dest)
{
m_assembler.imull_mr(src.offset, src.base, dest);
}
+
+ void mul32(Address src1, RegisterID src2, RegisterID dest)
+ {
+ move(src2, dest);
+ mul32(src1, dest);
+ }
+
+ void mul32(RegisterID src1, Address src2, RegisterID dest)
+ {
+ move(src1, dest);
+ mul32(src2, dest);
+ }
void mul32(TrustedImm32 imm, RegisterID src, RegisterID dest)
{
@@ -415,6 +449,18 @@
}
}
+ void or32(Address op1, RegisterID op2, RegisterID dest)
+ {
+ move(op2, dest);
+ or32(op1, dest);
+ }
+
+ void or32(RegisterID op1, Address op2, RegisterID dest)
+ {
+ move(op1, dest);
+ or32(op2, dest);
+ }
+
void or32(TrustedImm32 imm, RegisterID src, RegisterID dest)
{
move(src, dest);
@@ -566,6 +612,18 @@
}
}
+ void xor32(Address op1, RegisterID op2, RegisterID dest)
+ {
+ move(op2, dest);
+ xor32(op1, dest);
+ }
+
+ void xor32(RegisterID op1, Address op2, RegisterID dest)
+ {
+ move(op1, dest);
+ xor32(op2, dest);
+ }
+
void xor32(TrustedImm32 imm, RegisterID src, RegisterID dest)
{
move(src, dest);
@@ -905,7 +963,7 @@
{
ASSERT(isSSE2Present());
if (src != dest)
- m_assembler.movsd_rr(src, dest);
+ m_assembler.movaps_rr(src, dest);
}
void loadDouble(TrustedImmPtr address, FPRegisterID dest)
@@ -1014,6 +1072,30 @@
m_assembler.addsd_mr(src.offset, src.base, dest);
}
+ void addDouble(Address op1, FPRegisterID op2, FPRegisterID dest)
+ {
+ ASSERT(isSSE2Present());
+ if (op2 == dest) {
+ addDouble(op1, dest);
+ return;
+ }
+
+ loadDouble(op1, dest);
+ addDouble(op2, dest);
+ }
+
+ void addDouble(FPRegisterID op1, Address op2, FPRegisterID dest)
+ {
+ ASSERT(isSSE2Present());
+ if (op1 == dest) {
+ addDouble(op2, dest);
+ return;
+ }
+
+ loadDouble(op2, dest);
+ addDouble(op1, dest);
+ }
+
void addFloat(FPRegisterID src, FPRegisterID dest)
{
ASSERT(isSSE2Present());
@@ -1026,6 +1108,41 @@
m_assembler.addss_mr(src.offset, src.base, dest);
}
+ void addFloat(FPRegisterID op1, FPRegisterID op2, FPRegisterID dest)
+ {
+ ASSERT(isSSE2Present());
+ if (op1 == dest)
+ addFloat(op2, dest);
+ else {
+ moveDouble(op2, dest);
+ addFloat(op1, dest);
+ }
+ }
+
+ void addFloat(Address op1, FPRegisterID op2, FPRegisterID dest)
+ {
+ ASSERT(isSSE2Present());
+ if (op2 == dest) {
+ addFloat(op1, dest);
+ return;
+ }
+
+ loadFloat(op1, dest);
+ addFloat(op2, dest);
+ }
+
+ void addFloat(FPRegisterID op1, Address op2, FPRegisterID dest)
+ {
+ ASSERT(isSSE2Present());
+ if (op1 == dest) {
+ addFloat(op2, dest);
+ return;
+ }
+
+ loadFloat(op2, dest);
+ addFloat(op1, dest);
+ }
+
void divDouble(FPRegisterID src, FPRegisterID dest)
{
ASSERT(isSSE2Present());
@@ -1115,6 +1232,28 @@
m_assembler.mulsd_mr(src.offset, src.base, dest);
}
+ void mulDouble(Address op1, FPRegisterID op2, FPRegisterID dest)
+ {
+ ASSERT(isSSE2Present());
+ if (op2 == dest) {
+ mulDouble(op1, dest);
+ return;
+ }
+ loadDouble(op1, dest);
+ mulDouble(op2, dest);
+ }
+
+ void mulDouble(FPRegisterID op1, Address op2, FPRegisterID dest)
+ {
+ ASSERT(isSSE2Present());
+ if (op1 == dest) {
+ mulDouble(op2, dest);
+ return;
+ }
+ loadDouble(op2, dest);
+ mulDouble(op1, dest);
+ }
+
void mulFloat(FPRegisterID src, FPRegisterID dest)
{
ASSERT(isSSE2Present());
@@ -1127,27 +1266,100 @@
m_assembler.mulss_mr(src.offset, src.base, dest);
}
+ void mulFloat(FPRegisterID op1, FPRegisterID op2, FPRegisterID dest)
+ {
+ ASSERT(isSSE2Present());
+ if (op1 == dest)
+ mulFloat(op2, dest);
+ else {
+ moveDouble(op2, dest);
+ mulFloat(op1, dest);
+ }
+ }
+
+ void mulFloat(Address op1, FPRegisterID op2, FPRegisterID dest)
+ {
+ ASSERT(isSSE2Present());
+ if (op2 == dest) {
+ mulFloat(op1, dest);
+ return;
+ }
+ loadFloat(op1, dest);
+ mulFloat(op2, dest);
+ }
+
+ void mulFloat(FPRegisterID op1, Address op2, FPRegisterID dest)
+ {
+ ASSERT(isSSE2Present());
+ if (op1 == dest) {
+ mulFloat(op2, dest);
+ return;
+ }
+ loadFloat(op2, dest);
+ mulFloat(op1, dest);
+ }
+
void andDouble(FPRegisterID src, FPRegisterID dst)
{
// ANDPS is defined on 128bits and is shorter than ANDPD.
m_assembler.andps_rr(src, dst);
}
+ void andDouble(FPRegisterID src1, FPRegisterID src2, FPRegisterID dst)
+ {
+ if (src1 == dst)
+ andDouble(src2, dst);
+ else {
+ moveDouble(src2, dst);
+ andDouble(src1, dst);
+ }
+ }
+
void andFloat(FPRegisterID src, FPRegisterID dst)
{
m_assembler.andps_rr(src, dst);
}
+ void andFloat(FPRegisterID src1, FPRegisterID src2, FPRegisterID dst)
+ {
+ if (src1 == dst)
+ andFloat(src2, dst);
+ else {
+ moveDouble(src2, dst);
+ andFloat(src1, dst);
+ }
+ }
+
void xorDouble(FPRegisterID src, FPRegisterID dst)
{
m_assembler.xorps_rr(src, dst);
}
+ void xorDouble(FPRegisterID src1, FPRegisterID src2, FPRegisterID dst)
+ {
+ if (src1 == dst)
+ xorDouble(src2, dst);
+ else {
+ moveDouble(src2, dst);
+ xorDouble(src1, dst);
+ }
+ }
+
void xorFloat(FPRegisterID src, FPRegisterID dst)
{
m_assembler.xorps_rr(src, dst);
}
+ void xorFloat(FPRegisterID src1, FPRegisterID src2, FPRegisterID dst)
+ {
+ if (src1 == dst)
+ xorFloat(src2, dst);
+ else {
+ moveDouble(src2, dst);
+ xorFloat(src1, dst);
+ }
+ }
+
void convertInt32ToDouble(RegisterID src, FPRegisterID dest)
{
ASSERT(isSSE2Present());
@@ -1710,6 +1922,18 @@
return branchAdd32(cond, src1, dest);
}
+ Jump branchAdd32(ResultCondition cond, Address src1, RegisterID src2, RegisterID dest)
+ {
+ move(src2, dest);
+ return branchAdd32(cond, src1, dest);
+ }
+
+ Jump branchAdd32(ResultCondition cond, RegisterID src1, Address src2, RegisterID dest)
+ {
+ move(src1, dest);
+ return branchAdd32(cond, src2, dest);
+ }
+
Jump branchAdd32(ResultCondition cond, RegisterID src, TrustedImm32 imm, RegisterID dest)
{
move(src, dest);
Modified: trunk/Source/_javascript_Core/assembler/MacroAssemblerX86_64.h (196512 => 196513)
--- trunk/Source/_javascript_Core/assembler/MacroAssemblerX86_64.h 2016-02-12 22:31:08 UTC (rev 196512)
+++ trunk/Source/_javascript_Core/assembler/MacroAssemblerX86_64.h 2016-02-12 22:32:44 UTC (rev 196513)
@@ -349,6 +349,18 @@
and64(scratchRegister(), srcDest);
}
+ void and64(RegisterID op1, RegisterID op2, RegisterID dest)
+ {
+ if (op1 == op2 && op1 != dest && op2 != dest)
+ move(op1, dest);
+ else if (op1 == dest)
+ and64(op2, dest);
+ else {
+ move(op2, dest);
+ and64(op1, dest);
+ }
+ }
+
void countLeadingZeros64(RegisterID src, RegisterID dst)
{
if (supportsLZCNT()) {
@@ -430,6 +442,16 @@
{
m_assembler.imulq_rr(src, dest);
}
+
+ void mul64(RegisterID src1, RegisterID src2, RegisterID dest)
+ {
+ if (src2 == dest) {
+ m_assembler.imulq_rr(src1, dest);
+ return;
+ }
+ move(src1, dest);
+ m_assembler.imulq_rr(src2, dest);
+ }
void x86ConvertToQuadWord64()
{
@@ -541,6 +563,18 @@
{
m_assembler.xorq_rr(src, dest);
}
+
+ void xor64(RegisterID op1, RegisterID op2, RegisterID dest)
+ {
+ if (op1 == op2)
+ move(TrustedImm32(0), dest);
+ else if (op1 == dest)
+ xor64(op2, dest);
+ else {
+ move(op2, dest);
+ xor64(op1, dest);
+ }
+ }
void xor64(RegisterID src, Address dest)
{
@@ -867,12 +901,38 @@
return Jump(m_assembler.jCC(x86Condition(cond)));
}
+ Jump branchAdd64(ResultCondition cond, RegisterID src1, RegisterID src2, RegisterID dest)
+ {
+ if (src1 == dest)
+ return branchAdd64(cond, src2, dest);
+ move(src2, dest);
+ return branchAdd64(cond, src1, dest);
+ }
+
+ Jump branchAdd64(ResultCondition cond, Address src1, RegisterID src2, RegisterID dest)
+ {
+ move(src2, dest);
+ return branchAdd64(cond, src1, dest);
+ }
+
+ Jump branchAdd64(ResultCondition cond, RegisterID src1, Address src2, RegisterID dest)
+ {
+ move(src1, dest);
+ return branchAdd64(cond, src2, dest);
+ }
+
Jump branchAdd64(ResultCondition cond, RegisterID src, RegisterID dest)
{
add64(src, dest);
return Jump(m_assembler.jCC(x86Condition(cond)));
}
+ Jump branchAdd64(ResultCondition cond, Address src, RegisterID dest)
+ {
+ add64(src, dest);
+ return Jump(m_assembler.jCC(x86Condition(cond)));
+ }
+
Jump branchMul64(ResultCondition cond, RegisterID src, RegisterID dest)
{
mul64(src, dest);
Modified: trunk/Source/_javascript_Core/assembler/X86Assembler.h (196512 => 196513)
--- trunk/Source/_javascript_Core/assembler/X86Assembler.h 2016-02-12 22:31:08 UTC (rev 196512)
+++ trunk/Source/_javascript_Core/assembler/X86Assembler.h 2016-02-12 22:32:44 UTC (rev 196513)
@@ -263,6 +263,8 @@
OP2_MOVSD_WsdVsd = 0x11,
OP2_MOVSS_VsdWsd = 0x10,
OP2_MOVSS_WsdVsd = 0x11,
+ OP2_MOVAPD_VpdWpd = 0x28,
+ OP2_MOVAPS_VpdWpd = 0x28,
OP2_CVTSI2SD_VsdEd = 0x2A,
OP2_CVTTSD2SI_GdWsd = 0x2C,
OP2_UCOMISD_VsdWsd = 0x2E,
@@ -2209,6 +2211,17 @@
}
#endif
+ void movapd_rr(XMMRegisterID src, XMMRegisterID dst)
+ {
+ m_formatter.prefix(PRE_SSE_66);
+ m_formatter.twoByteOp(OP2_MOVAPD_VpdWpd, (RegisterID)dst, (RegisterID)src);
+ }
+
+ void movaps_rr(XMMRegisterID src, XMMRegisterID dst)
+ {
+ m_formatter.twoByteOp(OP2_MOVAPS_VpdWpd, (RegisterID)dst, (RegisterID)src);
+ }
+
void movsd_rr(XMMRegisterID src, XMMRegisterID dst)
{
m_formatter.prefix(PRE_SSE_F2);
Modified: trunk/Source/_javascript_Core/b3/B3CheckSpecial.cpp (196512 => 196513)
--- trunk/Source/_javascript_Core/b3/B3CheckSpecial.cpp 2016-02-12 22:31:08 UTC (rev 196512)
+++ trunk/Source/_javascript_Core/b3/B3CheckSpecial.cpp 2016-02-12 22:32:44 UTC (rev 196513)
@@ -130,6 +130,15 @@
return admitsStackImpl(numB3Args(inst), m_numCheckArgs + 1, inst, argIndex);
}
+bool CheckSpecial::shouldTryAliasingDef(Inst& inst, unsigned& defIndex)
+{
+ if (hiddenBranch(inst).shouldTryAliasingDef(defIndex)) {
+ defIndex += 1;
+ return true;
+ }
+ return false;
+}
+
CCallHelpers::Jump CheckSpecial::generate(Inst& inst, CCallHelpers& jit, GenerationContext& context)
{
CCallHelpers::Jump fail = hiddenBranch(inst).generate(jit, context);
@@ -154,7 +163,8 @@
// If necessary, undo the operation.
switch (m_checkOpcode) {
case BranchAdd32:
- if (args[1] == args[2]) {
+ if ((m_numCheckArgs == 4 && args[1] == args[2] && args[2] == args[3])
+ || (m_numCheckArgs == 3 && args[1] == args[2])) {
// This is ugly, but that's fine - we won't have to do this very often.
ASSERT(args[1].isGPR());
GPRReg valueGPR = args[1].gpr();
@@ -167,10 +177,17 @@
jit.popToRestore(scratchGPR);
break;
}
- Inst(Sub32, nullptr, args[1], args[2]).generate(jit, context);
+ if (m_numCheckArgs == 4) {
+ if (args[1] == args[3])
+ Inst(Sub32, nullptr, args[2], args[3]).generate(jit, context);
+ else if (args[2] == args[3])
+ Inst(Sub32, nullptr, args[1], args[3]).generate(jit, context);
+ } else if (m_numCheckArgs == 3)
+ Inst(Sub32, nullptr, args[1], args[2]).generate(jit, context);
break;
case BranchAdd64:
- if (args[1] == args[2]) {
+ if ((m_numCheckArgs == 4 && args[1] == args[2] && args[2] == args[3])
+ || (m_numCheckArgs == 3 && args[1] == args[2])) {
// This is ugly, but that's fine - we won't have to do this very often.
ASSERT(args[1].isGPR());
GPRReg valueGPR = args[1].gpr();
@@ -183,7 +200,13 @@
jit.popToRestore(scratchGPR);
break;
}
- Inst(Sub64, nullptr, args[1], args[2]).generate(jit, context);
+ if (m_numCheckArgs == 4) {
+ if (args[1] == args[3])
+ Inst(Sub64, nullptr, args[2], args[3]).generate(jit, context);
+ else if (args[2] == args[3])
+ Inst(Sub64, nullptr, args[1], args[3]).generate(jit, context);
+ } else if (m_numCheckArgs == 3)
+ Inst(Sub64, nullptr, args[1], args[2]).generate(jit, context);
break;
case BranchSub32:
Inst(Add32, nullptr, args[1], args[2]).generate(jit, context);
Modified: trunk/Source/_javascript_Core/b3/B3CheckSpecial.h (196512 => 196513)
--- trunk/Source/_javascript_Core/b3/B3CheckSpecial.h 2016-02-12 22:31:08 UTC (rev 196512)
+++ trunk/Source/_javascript_Core/b3/B3CheckSpecial.h 2016-02-12 22:32:44 UTC (rev 196513)
@@ -126,6 +126,7 @@
void forEachArg(Air::Inst&, const ScopedLambda<Air::Inst::EachArgCallback>&) override;
bool isValid(Air::Inst&) override;
bool admitsStack(Air::Inst&, unsigned argIndex) override;
+ bool shouldTryAliasingDef(Air::Inst&, unsigned& defIndex) override;
// NOTE: the generate method will generate the hidden branch and then register a LatePath that
// generates the stackmap. Super crazy dude!
Modified: trunk/Source/_javascript_Core/b3/B3LowerToAir.cpp (196512 => 196513)
--- trunk/Source/_javascript_Core/b3/B3LowerToAir.cpp 2016-02-12 22:31:08 UTC (rev 196512)
+++ trunk/Source/_javascript_Core/b3/B3LowerToAir.cpp 2016-02-12 22:32:44 UTC (rev 196513)
@@ -2146,7 +2146,10 @@
} else if (imm(right) && isValidForm(opcode, Arg::ResCond, Arg::Imm, Arg::Tmp)) {
sources.append(imm(right));
append(Move, tmp(left), result);
- } else if (isValidForm(opcode, Arg::ResCond, Arg::Tmp, Arg::Tmp)) {
+ } else if (isValidForm(opcode, Arg::ResCond, Arg::Tmp, Arg::Tmp, Arg::Tmp)) {
+ sources.append(tmp(left));
+ sources.append(tmp(right));
+ } else if (isValidForm(opcode, Arg::ResCond, Arg::Tmp, Arg::Tmp)) {
if (commutativity == Commutative && preferRightForResult(left, right)) {
sources.append(tmp(left));
append(Move, tmp(right), result);
Modified: trunk/Source/_javascript_Core/b3/air/AirCustom.h (196512 => 196513)
--- trunk/Source/_javascript_Core/b3/air/AirCustom.h 2016-02-12 22:31:08 UTC (rev 196512)
+++ trunk/Source/_javascript_Core/b3/air/AirCustom.h 2016-02-12 22:32:44 UTC (rev 196513)
@@ -80,6 +80,11 @@
return inst.args[0].special()->admitsStack(inst, argIndex);
}
+ static bool shouldTryAliasingDef(Inst& inst, unsigned& defIndex)
+ {
+ return inst.args[0].special()->shouldTryAliasingDef(inst, defIndex);
+ }
+
static bool hasNonArgNonControlEffects(Inst& inst)
{
return inst.args[0].special()->hasNonArgNonControlEffects();
Modified: trunk/Source/_javascript_Core/b3/air/AirInst.h (196512 => 196513)
--- trunk/Source/_javascript_Core/b3/air/AirInst.h 2016-02-12 22:31:08 UTC (rev 196512)
+++ trunk/Source/_javascript_Core/b3/air/AirInst.h 2016-02-12 22:32:44 UTC (rev 196513)
@@ -180,6 +180,12 @@
// case being fall-through. This function is auto-generated by opcode_generator.rb.
CCallHelpers::Jump generate(CCallHelpers&, GenerationContext&);
+ // Returns true if the register allocator should attempt to alias the arguments with the destination
+ // for this instruction.
+ // If the method returns true, defIndex is set to the index of the destination argument. The indices
+ // (defIndex - 1) and (defIndex - 2) are the one to alias to defIndex.
+ bool shouldTryAliasingDef(unsigned& defIndex);
+
void dump(PrintStream&) const;
ArgList args;
Modified: trunk/Source/_javascript_Core/b3/air/AirInstInlines.h (196512 => 196513)
--- trunk/Source/_javascript_Core/b3/air/AirInstInlines.h 2016-02-12 22:31:08 UTC (rev 196512)
+++ trunk/Source/_javascript_Core/b3/air/AirInstInlines.h 2016-02-12 22:32:44 UTC (rev 196513)
@@ -164,6 +164,50 @@
return admitsStack(&arg - &args[0]);
}
+inline bool Inst::shouldTryAliasingDef(unsigned& defIndex)
+{
+ if (!isX86())
+ return false;
+
+ switch (opcode) {
+ case Add32:
+ case Add64:
+ case And32:
+ case And64:
+ case Mul32:
+ case Mul64:
+ case Or32:
+ case Or64:
+ case Xor32:
+ case Xor64:
+ case AddDouble:
+ case AddFloat:
+ case AndFloat:
+ case AndDouble:
+ case MulDouble:
+ case MulFloat:
+ case XorDouble:
+ case XorFloat:
+ if (args.size() == 3) {
+ defIndex = 2;
+ return true;
+ }
+ break;
+ case BranchAdd32:
+ case BranchAdd64:
+ if (args.size() == 4) {
+ defIndex = 3;
+ return true;
+ }
+ break;
+ case Patch:
+ return PatchCustom::shouldTryAliasingDef(*this, defIndex);
+ default:
+ break;
+ }
+ return false;
+}
+
inline bool isShiftValid(const Inst& inst)
{
#if CPU(X86) || CPU(X86_64)
Modified: trunk/Source/_javascript_Core/b3/air/AirIteratedRegisterCoalescing.cpp (196512 => 196513)
--- trunk/Source/_javascript_Core/b3/air/AirIteratedRegisterCoalescing.cpp 2016-02-12 22:31:08 UTC (rev 196512)
+++ trunk/Source/_javascript_Core/b3/air/AirIteratedRegisterCoalescing.cpp 2016-02-12 22:32:44 UTC (rev 196513)
@@ -169,7 +169,9 @@
if (traceDebug)
dataLog(" Coalesced\n");
- } else if (isPrecolored(v) || m_interferenceEdges.contains(InterferenceEdge(u, v))) {
+ } else if (isPrecolored(v)
+ || m_interferenceEdges.contains(InterferenceEdge(u, v))
+ || (u == m_framePointerIndex && m_interferesWithFramePointer.quickGet(v))) {
addWorkList(u);
addWorkList(v);
@@ -399,6 +401,9 @@
}
});
+ if (m_framePointerIndex && m_interferesWithFramePointer.quickGet(v))
+ m_interferesWithFramePointer.quickSet(u);
+
if (m_degrees[u] >= m_regsInPriorityOrder.size() && m_freezeWorklist.remove(u))
addToSpill(u);
}
@@ -574,18 +579,45 @@
BitVector m_isOnSelectStack;
Vector<IndexType> m_selectStack;
+ IndexType m_framePointerIndex { 0 };
+ BitVector m_interferesWithFramePointer;
+
struct OrderedMoveSet {
unsigned addMove()
{
- unsigned nextIndex = m_moveList.size();
+ ASSERT(m_lowPriorityMoveList.isEmpty());
+ ASSERT(!m_firstLowPriorityMoveIndex);
+
+ unsigned nextIndex = m_positionInMoveList.size();
+ unsigned position = m_moveList.size();
m_moveList.append(nextIndex);
- m_positionInMoveList.append(nextIndex);
+ m_positionInMoveList.append(position);
return nextIndex;
}
+ void startAddingLowPriorityMoves()
+ {
+ ASSERT(m_lowPriorityMoveList.isEmpty());
+ m_firstLowPriorityMoveIndex = m_moveList.size();
+ }
+
+ unsigned addLowPriorityMove()
+ {
+ ASSERT(m_firstLowPriorityMoveIndex == m_moveList.size());
+
+ unsigned nextIndex = m_positionInMoveList.size();
+ unsigned position = m_lowPriorityMoveList.size();
+ m_lowPriorityMoveList.append(nextIndex);
+ m_positionInMoveList.append(position);
+
+ ASSERT(nextIndex >= m_firstLowPriorityMoveIndex);
+
+ return nextIndex;
+ }
+
bool isEmpty() const
{
- return m_moveList.isEmpty();
+ return m_moveList.isEmpty() && m_lowPriorityMoveList.isEmpty();
}
bool contains(unsigned index)
@@ -599,11 +631,19 @@
if (positionInMoveList == std::numeric_limits<unsigned>::max())
return;
- ASSERT(m_moveList[positionInMoveList] == moveIndex);
- unsigned lastIndex = m_moveList.last();
- m_positionInMoveList[lastIndex] = positionInMoveList;
- m_moveList[positionInMoveList] = lastIndex;
- m_moveList.removeLast();
+ if (moveIndex < m_firstLowPriorityMoveIndex) {
+ ASSERT(m_moveList[positionInMoveList] == moveIndex);
+ unsigned lastIndex = m_moveList.last();
+ m_positionInMoveList[lastIndex] = positionInMoveList;
+ m_moveList[positionInMoveList] = lastIndex;
+ m_moveList.removeLast();
+ } else {
+ ASSERT(m_lowPriorityMoveList[positionInMoveList] == moveIndex);
+ unsigned lastIndex = m_lowPriorityMoveList.last();
+ m_positionInMoveList[lastIndex] = positionInMoveList;
+ m_lowPriorityMoveList[positionInMoveList] = lastIndex;
+ m_lowPriorityMoveList.removeLast();
+ }
m_positionInMoveList[moveIndex] = std::numeric_limits<unsigned>::max();
@@ -614,8 +654,14 @@
{
ASSERT(!isEmpty());
- unsigned lastIndex = m_moveList.takeLast();
- ASSERT(m_positionInMoveList[lastIndex] == m_moveList.size());
+ unsigned lastIndex;
+ if (!m_moveList.isEmpty()) {
+ lastIndex = m_moveList.takeLast();
+ ASSERT(m_positionInMoveList[lastIndex] == m_moveList.size());
+ } else {
+ lastIndex = m_lowPriorityMoveList.takeLast();
+ ASSERT(m_positionInMoveList[lastIndex] == m_lowPriorityMoveList.size());
+ }
m_positionInMoveList[lastIndex] = std::numeric_limits<unsigned>::max();
ASSERT(!contains(lastIndex));
@@ -629,9 +675,15 @@
// Values should not be added back if they were never taken out when attempting coalescing.
ASSERT(!contains(index));
- unsigned position = m_moveList.size();
- m_moveList.append(index);
- m_positionInMoveList[index] = position;
+ if (index < m_firstLowPriorityMoveIndex) {
+ unsigned position = m_moveList.size();
+ m_moveList.append(index);
+ m_positionInMoveList[index] = position;
+ } else {
+ unsigned position = m_lowPriorityMoveList.size();
+ m_lowPriorityMoveList.append(index);
+ m_positionInMoveList[index] = position;
+ }
ASSERT(contains(index));
}
@@ -640,11 +692,14 @@
{
m_positionInMoveList.clear();
m_moveList.clear();
+ m_lowPriorityMoveList.clear();
}
private:
Vector<unsigned, 0, UnsafeVectorOverflow> m_positionInMoveList;
Vector<unsigned, 0, UnsafeVectorOverflow> m_moveList;
+ Vector<unsigned, 0, UnsafeVectorOverflow> m_lowPriorityMoveList;
+ unsigned m_firstLowPriorityMoveIndex { 0 };
};
// Work lists.
@@ -678,6 +733,11 @@
, m_tmpWidth(tmpWidth)
, m_useCounts(useCounts)
{
+ if (type == Arg::GP) {
+ m_framePointerIndex = AbsoluteTmpMapper<type>::absoluteIndex(Tmp(MacroAssembler::framePointerRegister));
+ m_interferesWithFramePointer.ensureSize(tmpArraySize(code));
+ }
+
initializePrecoloredTmp();
build();
allocate();
@@ -802,6 +862,48 @@
}
}
+ bool mayBeCoalesced(Arg left, Arg right)
+ {
+ if (!left.isTmp() || !right.isTmp())
+ return false;
+
+ Tmp leftTmp = left.tmp();
+ Tmp rightTmp = right.tmp();
+
+ if (leftTmp == rightTmp)
+ return false;
+
+ if (leftTmp.isGP() != (type == Arg::GP) || rightTmp.isGP() != (type == Arg::GP))
+ return false;
+
+ unsigned leftIndex = AbsoluteTmpMapper<type>::absoluteIndex(leftTmp);
+ unsigned rightIndex = AbsoluteTmpMapper<type>::absoluteIndex(rightTmp);
+
+ return !m_interferenceEdges.contains(InterferenceEdge(leftIndex, rightIndex));
+ }
+
+ void addToLowPriorityCoalescingCandidates(Arg left, Arg right)
+ {
+ ASSERT(mayBeCoalesced(left, right));
+ Tmp leftTmp = left.tmp();
+ Tmp rightTmp = right.tmp();
+
+ unsigned leftIndex = AbsoluteTmpMapper<type>::absoluteIndex(leftTmp);
+ unsigned rightIndex = AbsoluteTmpMapper<type>::absoluteIndex(rightTmp);
+
+ unsigned nextMoveIndex = m_coalescingCandidates.size();
+ m_coalescingCandidates.append({ leftIndex, rightIndex });
+
+ unsigned newIndexInWorklist = m_worklistMoves.addLowPriorityMove();
+ ASSERT_UNUSED(newIndexInWorklist, newIndexInWorklist == nextMoveIndex);
+
+ ASSERT(nextMoveIndex <= m_activeMoves.size());
+ m_activeMoves.ensureSize(nextMoveIndex + 1);
+
+ m_moveList[leftIndex].add(nextMoveIndex);
+ m_moveList[rightIndex].add(nextMoveIndex);
+ }
+
void build()
{
TmpLiveness<type> liveness(m_code);
@@ -815,6 +917,7 @@
}
build(nullptr, &block->at(0), localCalc);
}
+ buildLowPriorityMoveList();
}
void build(Inst* prevInst, Inst* nextInst, const typename TmpLiveness<type>::LocalCalc& localCalc)
@@ -881,6 +984,32 @@
addEdges(prevInst, nextInst, localCalc.live());
}
+ void buildLowPriorityMoveList()
+ {
+ if (!isX86())
+ return;
+
+ m_worklistMoves.startAddingLowPriorityMoves();
+ for (BasicBlock* block : m_code) {
+ for (Inst& inst : *block) {
+ unsigned defArgIndex = 0;
+ if (inst.shouldTryAliasingDef(defArgIndex)) {
+ Arg op1 = inst.args[defArgIndex - 2];
+ Arg op2 = inst.args[defArgIndex - 1];
+ Arg dest = inst.args[defArgIndex];
+
+ if (op1 == dest || op2 == dest)
+ continue;
+
+ if (mayBeCoalesced(op1, dest))
+ addToLowPriorityCoalescingCandidates(op1, dest);
+ if (op1 != op2 && mayBeCoalesced(op2, dest))
+ addToLowPriorityCoalescingCandidates(op2, dest);
+ }
+ }
+ }
+ }
+
void addEdges(Inst* prevInst, Inst* nextInst, typename TmpLiveness<type>::LocalCalc::Iterable liveTmps)
{
// All the Def()s interfere with everthing live.
@@ -895,11 +1024,8 @@
addEdge(arg, liveTmp);
}
- if (type == Arg::GP && !arg.isGPR()) {
- m_interferenceEdges.add(InterferenceEdge(
- AbsoluteTmpMapper<type>::absoluteIndex(Tmp(MacroAssembler::framePointerRegister)),
- AbsoluteTmpMapper<type>::absoluteIndex(arg)));
- }
+ if (type == Arg::GP && !arg.isGPR())
+ m_interferesWithFramePointer.quickSet(AbsoluteTmpMapper<type>::absoluteIndex(arg));
});
}
@@ -1032,6 +1158,11 @@
if (debug) {
dataLog("Interference: ", listDump(m_interferenceEdges), "\n");
dumpInterferenceGraphInDot(WTF::dataFile());
+ dataLog("Coalescing candidates:\n");
+ for (MoveOperands& moveOp : m_coalescingCandidates) {
+ dataLog(" ", AbsoluteTmpMapper<type>::tmpFromAbsoluteIndex(moveOp.srcIndex),
+ " -> ", AbsoluteTmpMapper<type>::tmpFromAbsoluteIndex(moveOp.dstIndex), "\n");
+ }
dataLog("Initial work list\n");
dumpWorkLists(WTF::dataFile());
}
@@ -1131,7 +1262,7 @@
template<Arg::Type type>
void iteratedRegisterCoalescingOnType()
{
- HashSet<unsigned> unspillableTmps;
+ HashSet<unsigned> unspillableTmps = computeUnspillableTmps<type>();
// FIXME: If a Tmp is used only from a Scratch role and that argument is !admitsStack, then
// we should add the Tmp to unspillableTmps. That will help avoid relooping only to turn the
@@ -1172,6 +1303,72 @@
}
template<Arg::Type type>
+ HashSet<unsigned> computeUnspillableTmps()
+ {
+ HashSet<unsigned> unspillableTmps;
+
+ struct Range {
+ unsigned first { std::numeric_limits<unsigned>::max() };
+ unsigned last { 0 };
+ unsigned count { 0 };
+ unsigned admitStackCount { 0 };
+ };
+
+ unsigned numTmps = m_code.numTmps(type);
+ unsigned arraySize = AbsoluteTmpMapper<type>::absoluteIndex(numTmps);
+
+ Vector<Range, 0, UnsafeVectorOverflow> ranges;
+ ranges.fill(Range(), arraySize);
+
+ unsigned globalIndex = 0;
+ for (BasicBlock* block : m_code) {
+ for (Inst& inst : *block) {
+ inst.forEachArg([&] (Arg& arg, Arg::Role, Arg::Type argType, Arg::Width) {
+ if (arg.isTmp() && inst.admitsStack(arg)) {
+ if (argType != type)
+ return;
+
+ Tmp tmp = arg.tmp();
+ Range& range = ranges[AbsoluteTmpMapper<type>::absoluteIndex(tmp)];
+ range.count++;
+ range.admitStackCount++;
+ if (globalIndex < range.first) {
+ range.first = globalIndex;
+ range.last = globalIndex;
+ } else
+ range.last = globalIndex;
+
+ return;
+ }
+
+ arg.forEachTmpFast([&] (Tmp& tmp) {
+ if (tmp.isGP() != (type == Arg::GP))
+ return;
+
+ Range& range = ranges[AbsoluteTmpMapper<type>::absoluteIndex(tmp)];
+ range.count++;
+ if (globalIndex < range.first) {
+ range.first = globalIndex;
+ range.last = globalIndex;
+ } else
+ range.last = globalIndex;
+ });
+ });
+
+ ++globalIndex;
+ }
+ ++globalIndex;
+ }
+ for (unsigned i = AbsoluteTmpMapper<type>::lastMachineRegisterIndex() + 1; i < ranges.size(); ++i) {
+ Range& range = ranges[i];
+ if (range.last - range.first <= 1 && range.count > range.admitStackCount)
+ unspillableTmps.add(i);
+ }
+
+ return unspillableTmps;
+ }
+
+ template<Arg::Type type>
void assignRegistersToTmp(const ColoringAllocator<type>& allocator)
{
for (BasicBlock* block : m_code) {
Modified: trunk/Source/_javascript_Core/b3/air/AirOpcode.opcodes (196512 => 196513)
--- trunk/Source/_javascript_Core/b3/air/AirOpcode.opcodes 2016-02-12 22:31:08 UTC (rev 196512)
+++ trunk/Source/_javascript_Core/b3/air/AirOpcode.opcodes 2016-02-12 22:32:44 UTC (rev 196513)
@@ -107,6 +107,10 @@
Nop
+Add32 U:G:32, U:G:32, ZD:G:32
+ Imm, Tmp, Tmp
+ Tmp, Tmp, Tmp
+
Add32 U:G:32, UZD:G:32
Tmp, Tmp
x86: Imm, Addr
@@ -128,10 +132,6 @@
Tmp, Addr
Tmp, Index
-Add32 U:G:32, U:G:32, ZD:G:32
- Imm, Tmp, Tmp
- Tmp, Tmp, Tmp
-
64: Add64 U:G:64, UD:G:64
Tmp, Tmp
x86: Imm, Addr
@@ -143,15 +143,19 @@
Imm, Tmp, Tmp
Tmp, Tmp, Tmp
-arm64: AddDouble U:F:64, U:F:64, D:F:64
+AddDouble U:F:64, U:F:64, D:F:64
Tmp, Tmp, Tmp
+ x86: Addr, Tmp, Tmp
+ x86: Tmp, Addr, Tmp
x86: AddDouble U:F:64, UD:F:64
Tmp, Tmp
Addr, Tmp
-arm64: AddFloat U:F:32, U:F:32, D:F:32
+AddFloat U:F:32, U:F:32, D:F:32
Tmp, Tmp, Tmp
+ x86: Addr, Tmp, Tmp
+ x86: Tmp, Addr, Tmp
x86: AddFloat U:F:32, UD:F:32
Tmp, Tmp
@@ -200,13 +204,15 @@
x86: Addr, Tmp
Mul32 U:G:32, U:G:32, ZD:G:32
- arm64: Tmp, Tmp, Tmp
+ Tmp, Tmp, Tmp
+ x86: Addr, Tmp, Tmp
+ x86: Tmp, Addr, Tmp
x86: Imm, Tmp, Tmp
64: Mul64 U:G:64, UD:G:64
Tmp, Tmp
-arm64: Mul64 U:G:64, U:G:64, D:G:64
+Mul64 U:G:64, U:G:64, D:G:64
Tmp, Tmp, Tmp
arm64: Div32 U:G:32, U:G:32, ZD:G:32
@@ -215,15 +221,19 @@
arm64: Div64 U:G:64, U:G:64, D:G:64
Tmp, Tmp, Tmp
-arm64: MulDouble U:F:64, U:F:64, D:F:64
+MulDouble U:F:64, U:F:64, D:F:64
Tmp, Tmp, Tmp
+ x86: Addr, Tmp, Tmp
+ x86: Tmp, Addr, Tmp
x86: MulDouble U:F:64, UD:F:64
Tmp, Tmp
Addr, Tmp
-arm64: MulFloat U:F:32, U:F:32, D:F:32
+MulFloat U:F:32, U:F:32, D:F:32
Tmp, Tmp, Tmp
+ x86: Addr, Tmp, Tmp
+ x86: Tmp, Addr, Tmp
x86: MulFloat U:F:32, UD:F:32
Tmp, Tmp
@@ -258,6 +268,11 @@
Lea UA:G:Ptr, D:G:Ptr
Addr, Tmp
+And32 U:G:32, U:G:32, ZD:G:32
+ Tmp, Tmp, Tmp
+ x86: Tmp, Addr, Tmp
+ x86: Addr, Tmp, Tmp
+
And32 U:G:32, UZD:G:32
Tmp, Tmp
x86: Imm, Tmp
@@ -265,25 +280,34 @@
x86: Addr, Tmp
x86: Imm, Addr
-64: And64 U:G:64, UD:G:64
+64: And64 U:G:64, U:G:64, D:G:64
+ Tmp, Tmp, Tmp
+
+x86_64: And64 U:G:64, UD:G:64
Tmp, Tmp
x86: Imm, Tmp
-arm64: AndDouble U:F:64, U:F:64, D:F:64
+AndDouble U:F:64, U:F:64, D:F:64
Tmp, Tmp, Tmp
x86: AndDouble U:F:64, UD:F:64
Tmp, Tmp
-arm64: AndFloat U:F:32, U:F:32, D:F:32
+AndFloat U:F:32, U:F:32, D:F:32
Tmp, Tmp, Tmp
x86: AndFloat U:F:32, UD:F:32
Tmp, Tmp
+x86: XorDouble U:F:64, U:F:64, D:F:64
+ Tmp, Tmp, Tmp
+
x86: XorDouble U:F:64, UD:F:64
Tmp, Tmp
+x86: XorFloat U:F:32, U:F:32, D:F:32
+ Tmp, Tmp, Tmp
+
x86: XorFloat U:F:32, UD:F:32
Tmp, Tmp
@@ -335,6 +359,11 @@
Tmp*, Tmp
Imm, Tmp
+Or32 U:G:32, U:G:32, ZD:G:32
+ Tmp, Tmp, Tmp
+ x86: Tmp, Addr, Tmp
+ x86: Addr, Tmp, Tmp
+
Or32 U:G:32, UZD:G:32
Tmp, Tmp
x86: Imm, Tmp
@@ -342,10 +371,18 @@
x86: Addr, Tmp
x86: Imm, Addr
+64: Or64 U:G:64, U:G:64, D:G:64
+ Tmp, Tmp, Tmp
+
64: Or64 U:G:64, UD:G:64
Tmp, Tmp
x86: Imm, Tmp
+Xor32 U:G:32, U:G:32, ZD:G:32
+ Tmp, Tmp, Tmp
+ x86: Tmp, Addr, Tmp
+ x86: Addr, Tmp, Tmp
+
Xor32 U:G:32, UZD:G:32
Tmp, Tmp
x86: Imm, Tmp
@@ -353,6 +390,9 @@
x86: Addr, Tmp
x86: Imm, Addr
+64: Xor64 U:G:64, U:G:64, D:G:64
+ Tmp, Tmp, Tmp
+
64: Xor64 U:G:64, UD:G:64
Tmp, Tmp
x86: Tmp, Addr
@@ -609,6 +649,11 @@
BranchFloat U:G:32, U:F:32, U:F:32 /branch
DoubleCond, Tmp, Tmp
+BranchAdd32 U:G:32, U:G:32, U:G:32, ZD:G:32 /branch
+ ResCond, Tmp, Tmp, Tmp
+ x86:ResCond, Tmp, Addr, Tmp
+ x86:ResCond, Addr, Tmp, Tmp
+
BranchAdd32 U:G:32, U:G:32, UZD:G:32 /branch
ResCond, Tmp, Tmp
ResCond, Imm, Tmp
@@ -616,9 +661,15 @@
x86: ResCond, Tmp, Addr
x86: ResCond, Addr, Tmp
+BranchAdd64 U:G:32, U:G:64, U:G:64, ZD:G:64 /branch
+ ResCond, Tmp, Tmp, Tmp
+ x86:ResCond, Tmp, Addr, Tmp
+ x86:ResCond, Addr, Tmp, Tmp
+
64: BranchAdd64 U:G:32, U:G:64, UD:G:64 /branch
ResCond, Imm, Tmp
ResCond, Tmp, Tmp
+ x86:ResCond, Addr, Tmp
x86: BranchMul32 U:G:32, U:G:32, UZD:G:32 /branch
ResCond, Tmp, Tmp
Modified: trunk/Source/_javascript_Core/b3/air/AirSpecial.cpp (196512 => 196513)
--- trunk/Source/_javascript_Core/b3/air/AirSpecial.cpp 2016-02-12 22:31:08 UTC (rev 196512)
+++ trunk/Source/_javascript_Core/b3/air/AirSpecial.cpp 2016-02-12 22:32:44 UTC (rev 196513)
@@ -50,6 +50,11 @@
return out.toCString();
}
+bool Special::shouldTryAliasingDef(Inst&, unsigned&)
+{
+ return false;
+}
+
bool Special::hasNonArgNonControlEffects()
{
return true;
Modified: trunk/Source/_javascript_Core/b3/air/AirSpecial.h (196512 => 196513)
--- trunk/Source/_javascript_Core/b3/air/AirSpecial.h 2016-02-12 22:31:08 UTC (rev 196512)
+++ trunk/Source/_javascript_Core/b3/air/AirSpecial.h 2016-02-12 22:32:44 UTC (rev 196513)
@@ -56,6 +56,7 @@
virtual void forEachArg(Inst&, const ScopedLambda<Inst::EachArgCallback>&) = 0;
virtual bool isValid(Inst&) = 0;
virtual bool admitsStack(Inst&, unsigned argIndex) = 0;
+ virtual bool shouldTryAliasingDef(Inst&, unsigned& defIndex);
// This gets called on for each Inst that uses this Special. Note that there is no way to
// guarantee that a Special gets used from just one Inst, because Air might taildup late. So,
Modified: trunk/Source/_javascript_Core/b3/testb3.cpp (196512 => 196513)
--- trunk/Source/_javascript_Core/b3/testb3.cpp 2016-02-12 22:31:08 UTC (rev 196512)
+++ trunk/Source/_javascript_Core/b3/testb3.cpp 2016-02-12 22:32:44 UTC (rev 196513)
@@ -7492,6 +7492,146 @@
CHECK(invoke<int>(*code) == 42);
}
+void testCheckAddArgumentAliasing64()
+{
+ Procedure proc;
+ BasicBlock* root = proc.addBlock();
+ Value* arg1 = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0);
+ Value* arg2 = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR1);
+ Value* arg3 = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR2);
+
+ // Pretend to use all the args.
+ PatchpointValue* useArgs = root->appendNew<PatchpointValue>(proc, Void, Origin());
+ useArgs->append(ConstrainedValue(arg1, ValueRep::SomeRegister));
+ useArgs->append(ConstrainedValue(arg2, ValueRep::SomeRegister));
+ useArgs->append(ConstrainedValue(arg3, ValueRep::SomeRegister));
+ useArgs->setGenerator([&] (CCallHelpers&, const StackmapGenerationParams&) { });
+
+ // Last use of first arg (here, arg1).
+ CheckValue* checkAdd1 = root->appendNew<CheckValue>(proc, CheckAdd, Origin(), arg1, arg2);
+ checkAdd1->setGenerator([&] (CCallHelpers& jit, const StackmapGenerationParams&) { jit.oops(); });
+
+ // Last use of second arg (here, arg2).
+ CheckValue* checkAdd2 = root->appendNew<CheckValue>(proc, CheckAdd, Origin(), arg3, arg2);
+ checkAdd2->setGenerator([&] (CCallHelpers& jit, const StackmapGenerationParams&) { jit.oops(); });
+
+ // Keep arg3 live.
+ PatchpointValue* keepArg2Live = root->appendNew<PatchpointValue>(proc, Void, Origin());
+ keepArg2Live->append(ConstrainedValue(arg2, ValueRep::SomeRegister));
+ keepArg2Live->setGenerator([&] (CCallHelpers&, const StackmapGenerationParams&) { });
+
+ // Only use of checkAdd1 and checkAdd2.
+ CheckValue* checkAdd3 = root->appendNew<CheckValue>(proc, CheckAdd, Origin(), checkAdd1, checkAdd2);
+ checkAdd3->setGenerator([&] (CCallHelpers& jit, const StackmapGenerationParams&) { jit.oops(); });
+
+ root->appendNew<ControlValue>(proc, Return, Origin(), checkAdd3);
+
+ CHECK(compileAndRun<int64_t>(proc, 1, 2, 3) == 8);
+}
+
+void testCheckAddArgumentAliasing32()
+{
+ Procedure proc;
+ BasicBlock* root = proc.addBlock();
+ Value* arg1 = root->appendNew<Value>(
+ proc, Trunc, Origin(),
+ root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0));
+ Value* arg2 = root->appendNew<Value>(
+ proc, Trunc, Origin(),
+ root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR1));
+ Value* arg3 = root->appendNew<Value>(
+ proc, Trunc, Origin(),
+ root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR2));
+
+ // Pretend to use all the args.
+ PatchpointValue* useArgs = root->appendNew<PatchpointValue>(proc, Void, Origin());
+ useArgs->append(ConstrainedValue(arg1, ValueRep::SomeRegister));
+ useArgs->append(ConstrainedValue(arg2, ValueRep::SomeRegister));
+ useArgs->append(ConstrainedValue(arg3, ValueRep::SomeRegister));
+ useArgs->setGenerator([&] (CCallHelpers&, const StackmapGenerationParams&) { });
+
+ // Last use of first arg (here, arg1).
+ CheckValue* checkAdd1 = root->appendNew<CheckValue>(proc, CheckAdd, Origin(), arg1, arg2);
+ checkAdd1->setGenerator([&] (CCallHelpers& jit, const StackmapGenerationParams&) { jit.oops(); });
+
+ // Last use of second arg (here, arg3).
+ CheckValue* checkAdd2 = root->appendNew<CheckValue>(proc, CheckAdd, Origin(), arg2, arg3);
+ checkAdd2->setGenerator([&] (CCallHelpers& jit, const StackmapGenerationParams&) { jit.oops(); });
+
+ // Keep arg3 live.
+ PatchpointValue* keepArg2Live = root->appendNew<PatchpointValue>(proc, Void, Origin());
+ keepArg2Live->append(ConstrainedValue(arg2, ValueRep::SomeRegister));
+ keepArg2Live->setGenerator([&] (CCallHelpers&, const StackmapGenerationParams&) { });
+
+ // Only use of checkAdd1 and checkAdd2.
+ CheckValue* checkAdd3 = root->appendNew<CheckValue>(proc, CheckAdd, Origin(), checkAdd1, checkAdd2);
+ checkAdd3->setGenerator([&] (CCallHelpers& jit, const StackmapGenerationParams&) { jit.oops(); });
+
+ root->appendNew<ControlValue>(proc, Return, Origin(), checkAdd3);
+
+ CHECK(compileAndRun<int32_t>(proc, 1, 2, 3) == 8);
+}
+
+void testCheckAddSelfOverflow64()
+{
+ Procedure proc;
+ BasicBlock* root = proc.addBlock();
+ Value* arg = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0);
+ CheckValue* checkAdd = root->appendNew<CheckValue>(proc, CheckAdd, Origin(), arg, arg);
+ checkAdd->append(arg);
+ checkAdd->setGenerator(
+ [&] (CCallHelpers& jit, const StackmapGenerationParams& params) {
+ AllowMacroScratchRegisterUsage allowScratch(jit);
+ jit.move(params[0].gpr(), GPRInfo::returnValueGPR);
+ jit.emitFunctionEpilogue();
+ jit.ret();
+ });
+
+ // Make sure the arg is not the destination of the operation.
+ PatchpointValue* opaqueUse = root->appendNew<PatchpointValue>(proc, Void, Origin());
+ opaqueUse->append(ConstrainedValue(arg, ValueRep::SomeRegister));
+ opaqueUse->setGenerator([&] (CCallHelpers&, const StackmapGenerationParams&) { });
+
+ root->appendNew<ControlValue>(proc, Return, Origin(), checkAdd);
+
+ auto code = compile(proc);
+
+ CHECK(invoke<int64_t>(*code, 0ll) == 0);
+ CHECK(invoke<int64_t>(*code, 1ll) == 2);
+ CHECK(invoke<int64_t>(*code, std::numeric_limits<int64_t>::max()) == std::numeric_limits<int64_t>::max());
+}
+
+void testCheckAddSelfOverflow32()
+{
+ Procedure proc;
+ BasicBlock* root = proc.addBlock();
+ Value* arg = root->appendNew<Value>(
+ proc, Trunc, Origin(),
+ root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0));
+ CheckValue* checkAdd = root->appendNew<CheckValue>(proc, CheckAdd, Origin(), arg, arg);
+ checkAdd->append(arg);
+ checkAdd->setGenerator(
+ [&] (CCallHelpers& jit, const StackmapGenerationParams& params) {
+ AllowMacroScratchRegisterUsage allowScratch(jit);
+ jit.move(params[0].gpr(), GPRInfo::returnValueGPR);
+ jit.emitFunctionEpilogue();
+ jit.ret();
+ });
+
+ // Make sure the arg is not the destination of the operation.
+ PatchpointValue* opaqueUse = root->appendNew<PatchpointValue>(proc, Void, Origin());
+ opaqueUse->append(ConstrainedValue(arg, ValueRep::SomeRegister));
+ opaqueUse->setGenerator([&] (CCallHelpers&, const StackmapGenerationParams&) { });
+
+ root->appendNew<ControlValue>(proc, Return, Origin(), checkAdd);
+
+ auto code = compile(proc);
+
+ CHECK(invoke<int32_t>(*code, 0ll) == 0);
+ CHECK(invoke<int32_t>(*code, 1ll) == 2);
+ CHECK(invoke<int32_t>(*code, std::numeric_limits<int32_t>::max()) == std::numeric_limits<int32_t>::max());
+}
+
void testCheckSubImm()
{
Procedure proc;
@@ -7943,6 +8083,86 @@
CHECK(invoke<int>(*code) == 42);
}
+void testCheckMulArgumentAliasing64()
+{
+ Procedure proc;
+ BasicBlock* root = proc.addBlock();
+ Value* arg1 = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0);
+ Value* arg2 = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR1);
+ Value* arg3 = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR2);
+
+ // Pretend to use all the args.
+ PatchpointValue* useArgs = root->appendNew<PatchpointValue>(proc, Void, Origin());
+ useArgs->append(ConstrainedValue(arg1, ValueRep::SomeRegister));
+ useArgs->append(ConstrainedValue(arg2, ValueRep::SomeRegister));
+ useArgs->append(ConstrainedValue(arg3, ValueRep::SomeRegister));
+ useArgs->setGenerator([&] (CCallHelpers&, const StackmapGenerationParams&) { });
+
+ // Last use of first arg (here, arg1).
+ CheckValue* checkMul1 = root->appendNew<CheckValue>(proc, CheckMul, Origin(), arg1, arg2);
+ checkMul1->setGenerator([&] (CCallHelpers& jit, const StackmapGenerationParams&) { jit.oops(); });
+
+ // Last use of second arg (here, arg2).
+ CheckValue* checkMul2 = root->appendNew<CheckValue>(proc, CheckMul, Origin(), arg3, arg2);
+ checkMul2->setGenerator([&] (CCallHelpers& jit, const StackmapGenerationParams&) { jit.oops(); });
+
+ // Keep arg3 live.
+ PatchpointValue* keepArg2Live = root->appendNew<PatchpointValue>(proc, Void, Origin());
+ keepArg2Live->append(ConstrainedValue(arg2, ValueRep::SomeRegister));
+ keepArg2Live->setGenerator([&] (CCallHelpers&, const StackmapGenerationParams&) { });
+
+ // Only use of checkMul1 and checkMul2.
+ CheckValue* checkMul3 = root->appendNew<CheckValue>(proc, CheckMul, Origin(), checkMul1, checkMul2);
+ checkMul3->setGenerator([&] (CCallHelpers& jit, const StackmapGenerationParams&) { jit.oops(); });
+
+ root->appendNew<ControlValue>(proc, Return, Origin(), checkMul3);
+
+ CHECK(compileAndRun<int64_t>(proc, 2, 3, 4) == 72);
+}
+
+void testCheckMulArgumentAliasing32()
+{
+ Procedure proc;
+ BasicBlock* root = proc.addBlock();
+ Value* arg1 = root->appendNew<Value>(
+ proc, Trunc, Origin(),
+ root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0));
+ Value* arg2 = root->appendNew<Value>(
+ proc, Trunc, Origin(),
+ root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR1));
+ Value* arg3 = root->appendNew<Value>(
+ proc, Trunc, Origin(),
+ root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR2));
+
+ // Pretend to use all the args.
+ PatchpointValue* useArgs = root->appendNew<PatchpointValue>(proc, Void, Origin());
+ useArgs->append(ConstrainedValue(arg1, ValueRep::SomeRegister));
+ useArgs->append(ConstrainedValue(arg2, ValueRep::SomeRegister));
+ useArgs->append(ConstrainedValue(arg3, ValueRep::SomeRegister));
+ useArgs->setGenerator([&] (CCallHelpers&, const StackmapGenerationParams&) { });
+
+ // Last use of first arg (here, arg1).
+ CheckValue* checkMul1 = root->appendNew<CheckValue>(proc, CheckMul, Origin(), arg1, arg2);
+ checkMul1->setGenerator([&] (CCallHelpers& jit, const StackmapGenerationParams&) { jit.oops(); });
+
+ // Last use of second arg (here, arg3).
+ CheckValue* checkMul2 = root->appendNew<CheckValue>(proc, CheckMul, Origin(), arg2, arg3);
+ checkMul2->setGenerator([&] (CCallHelpers& jit, const StackmapGenerationParams&) { jit.oops(); });
+
+ // Keep arg3 live.
+ PatchpointValue* keepArg2Live = root->appendNew<PatchpointValue>(proc, Void, Origin());
+ keepArg2Live->append(ConstrainedValue(arg2, ValueRep::SomeRegister));
+ keepArg2Live->setGenerator([&] (CCallHelpers&, const StackmapGenerationParams&) { });
+
+ // Only use of checkMul1 and checkMul2.
+ CheckValue* checkMul3 = root->appendNew<CheckValue>(proc, CheckMul, Origin(), checkMul1, checkMul2);
+ checkMul3->setGenerator([&] (CCallHelpers& jit, const StackmapGenerationParams&) { jit.oops(); });
+
+ root->appendNew<ControlValue>(proc, Return, Origin(), checkMul3);
+
+ CHECK(compileAndRun<int32_t>(proc, 2, 3, 4) == 72);
+}
+
void testCheckMul64SShr()
{
Procedure proc;
@@ -11061,6 +11281,10 @@
RUN(testCheckAdd64());
RUN(testCheckAddFold(100, 200));
RUN(testCheckAddFoldFail(2147483647, 100));
+ RUN(testCheckAddArgumentAliasing64());
+ RUN(testCheckAddArgumentAliasing32());
+ RUN(testCheckAddSelfOverflow64());
+ RUN(testCheckAddSelfOverflow32());
RUN(testCheckSubImm());
RUN(testCheckSubBadImm());
RUN(testCheckSub());
@@ -11075,6 +11299,8 @@
RUN(testCheckMul64());
RUN(testCheckMulFold(100, 200));
RUN(testCheckMulFoldFail(2147483647, 100));
+ RUN(testCheckMulArgumentAliasing64());
+ RUN(testCheckMulArgumentAliasing32());
RUN(testCompare(Equal, 42, 42));
RUN(testCompare(NotEqual, 42, 42));
@@ -11604,6 +11830,7 @@
RUN(testSShrShl64(-42000000000, 8, 8));
RUN(testCheckMul64SShr());
+
RUN(testComputeDivisionMagic<int32_t>(2, -2147483647, 0));
RUN(testTrivialInfiniteLoop());
RUN(testFoldPathEqual());
Modified: trunk/Source/_javascript_Core/dfg/DFGOSRExitCompilerCommon.cpp (196512 => 196513)
--- trunk/Source/_javascript_Core/dfg/DFGOSRExitCompilerCommon.cpp 2016-02-12 22:31:08 UTC (rev 196512)
+++ trunk/Source/_javascript_Core/dfg/DFGOSRExitCompilerCommon.cpp 2016-02-12 22:32:44 UTC (rev 196513)
@@ -213,7 +213,7 @@
jit.emitSaveOrCopyCalleeSavesFor(
baselineCodeBlock,
static_cast<VirtualRegister>(inlineCallFrame->stackOffset),
- trueCaller ? AssemblyHelpers::UseExistingTagRegisterContents : AssemblyHelpers::CopySavedTagRegistersFromBaseFrame,
+ trueCaller ? AssemblyHelpers::UseExistingTagRegisterContents : AssemblyHelpers::CopyBaselineCalleeSavedRegistersFromBaseFrame,
GPRInfo::regT2);
if (!inlineCallFrame->isVarargs())
Modified: trunk/Source/_javascript_Core/jit/AssemblyHelpers.h (196512 => 196513)
--- trunk/Source/_javascript_Core/jit/AssemblyHelpers.h 2016-02-12 22:31:08 UTC (rev 196512)
+++ trunk/Source/_javascript_Core/jit/AssemblyHelpers.h 2016-02-12 22:32:44 UTC (rev 196513)
@@ -213,7 +213,7 @@
}
}
- enum RestoreTagRegisterMode { UseExistingTagRegisterContents, CopySavedTagRegistersFromBaseFrame };
+ enum RestoreTagRegisterMode { UseExistingTagRegisterContents, CopyBaselineCalleeSavedRegistersFromBaseFrame };
void emitSaveOrCopyCalleeSavesFor(CodeBlock* codeBlock, VirtualRegister offsetVirtualRegister, RestoreTagRegisterMode tagRegisterMode, GPRReg temp)
{
@@ -222,6 +222,10 @@
RegisterAtOffsetList* calleeSaves = codeBlock->calleeSaveRegisters();
RegisterSet dontSaveRegisters = RegisterSet(RegisterSet::stackRegisters(), RegisterSet::allFPRs());
unsigned registerCount = calleeSaves->size();
+
+#if USE(JSVALUE64)
+ RegisterSet baselineCalleeSaves = RegisterSet::llintBaselineCalleeSaveRegisters();
+#endif
for (unsigned i = 0; i < registerCount; i++) {
RegisterAtOffset entry = calleeSaves->at(i);
@@ -234,8 +238,7 @@
UNUSED_PARAM(tagRegisterMode);
UNUSED_PARAM(temp);
#else
- if (tagRegisterMode == CopySavedTagRegistersFromBaseFrame
- && (entry.reg() == GPRInfo::tagTypeNumberRegister || entry.reg() == GPRInfo::tagMaskRegister)) {
+ if (tagRegisterMode == CopyBaselineCalleeSavedRegistersFromBaseFrame && baselineCalleeSaves.get(entry.reg())) {
registerToWrite = temp;
loadPtr(AssemblyHelpers::Address(GPRInfo::callFrameRegister, entry.offset()), registerToWrite);
} else