- Revision
- 211670
- Author
- [email protected]
- Date
- 2017-02-04 05:46:19 -0800 (Sat, 04 Feb 2017)
Log Message
[JSC] Add operationToInt32SensibleSlow to optimize kraken pbkdf2 and sha256
https://bugs.webkit.org/show_bug.cgi?id=167736
Reviewed by Saam Barati.
JSTests:
* stress/to-int32-sensible.js: Added.
(shouldBe):
(toInt32):
(test):
Source/_javascript_Core:
Add a new function operationToInt32SensibleSlow. This function is only
called after x86 cvttss2si_rr is failed. This means that the
given double number never in range of int32 truncatable numbers.
As a result, exp in operationToInt32 always becomes >= 31. So
we can change the condition from `exp < 32` to `exp == 31`.
This makes missingOne constant. And it leads significantly good
code generation.
The original operationToInt32 code.
170: 66 48 0f 7e c1 movq %xmm0,%rcx
175: 31 c0 xor %eax,%eax
177: 66 48 0f 7e c6 movq %xmm0,%rsi
17c: 48 c1 f9 34 sar $0x34,%rcx
180: 81 e1 ff 07 00 00 and $0x7ff,%ecx
186: 8d 91 01 fc ff ff lea -0x3ff(%rcx),%edx
18c: 83 fa 53 cmp $0x53,%edx
18f: 77 37 ja 1c8 <_ZN3JSC16operationToInt32Ed+0x58>
191: 83 fa 34 cmp $0x34,%edx
194: 7f 3a jg 1d0 <_ZN3JSC16operationToInt32Ed+0x60>
196: b9 34 00 00 00 mov $0x34,%ecx
19b: 66 48 0f 7e c7 movq %xmm0,%rdi
1a0: 29 d1 sub %edx,%ecx
1a2: 48 d3 ff sar %cl,%rdi
1a5: 83 fa 1f cmp $0x1f,%edx
1a8: 89 f8 mov %edi,%eax
1aa: 7f 12 jg 1be <_ZN3JSC16operationToInt32Ed+0x4e>
1ac: 89 d1 mov %edx,%ecx
1ae: b8 01 00 00 00 mov $0x1,%eax
1b3: d3 e0 shl %cl,%eax
1b5: 89 c2 mov %eax,%edx
1b7: 8d 40 ff lea -0x1(%rax),%eax
1ba: 21 f8 and %edi,%eax
1bc: 01 d0 add %edx,%eax
1be: 89 c2 mov %eax,%edx
1c0: f7 da neg %edx
1c2: 48 85 f6 test %rsi,%rsi
1c5: 0f 48 c2 cmovs %edx,%eax
1c8: f3 c3 repz retq
1ca: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
1d0: 66 48 0f 7e c0 movq %xmm0,%rax
1d5: 81 e9 33 04 00 00 sub $0x433,%ecx
1db: 48 d3 e0 shl %cl,%rax
1de: eb de jmp 1be <_ZN3JSC16operationToInt32Ed+0x4e>
The operationToInt32SensibleSlow code.
1e0: 66 48 0f 7e c1 movq %xmm0,%rcx
1e5: 66 48 0f 7e c2 movq %xmm0,%rdx
1ea: 48 c1 f9 34 sar $0x34,%rcx
1ee: 81 e1 ff 07 00 00 and $0x7ff,%ecx
1f4: 8d b1 01 fc ff ff lea -0x3ff(%rcx),%esi
1fa: 83 fe 34 cmp $0x34,%esi
1fd: 7e 21 jle 220 <_ZN3JSC28operationToInt32SensibleSlowEd+0x40>
1ff: 66 48 0f 7e c0 movq %xmm0,%rax
204: 81 e9 33 04 00 00 sub $0x433,%ecx
20a: 48 d3 e0 shl %cl,%rax
20d: 89 c1 mov %eax,%ecx
20f: f7 d9 neg %ecx
211: 48 85 d2 test %rdx,%rdx
214: 0f 48 c1 cmovs %ecx,%eax
217: c3 retq
218: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
21f: 00
220: 66 48 0f 7e c0 movq %xmm0,%rax
225: b9 34 00 00 00 mov $0x34,%ecx
22a: 29 f1 sub %esi,%ecx
22c: 48 d3 f8 sar %cl,%rax
22f: 89 c1 mov %eax,%ecx
231: 81 c9 00 00 00 80 or $0x80000000,%ecx
237: 83 fe 1f cmp $0x1f,%esi
23a: 0f 44 c1 cmove %ecx,%eax
23d: 89 c1 mov %eax,%ecx
23f: f7 d9 neg %ecx
241: 48 85 d2 test %rdx,%rdx
244: 0f 48 c1 cmovs %ecx,%eax
247: c3 retq
248: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
24f: 00
This improves kraken pbkdf2 by 10.8% and sha256 by 7.5%.
baseline patched
stanford-crypto-pbkdf2 153.195+-2.745 ^ 138.204+-2.513 ^ definitely 1.1085x faster
stanford-crypto-sha256-iterative 49.047+-1.038 ^ 45.610+-1.235 ^ definitely 1.0754x faster
<arithmetic> 101.121+-1.379 ^ 91.907+-1.500 ^ definitely 1.1003x faster
* assembler/CPU.h:
(JSC::hasSensibleDoubleToInt):
* dfg/DFGSpeculativeJIT.cpp:
(JSC::DFG::SpeculativeJIT::compileValueToInt32):
* ftl/FTLLowerDFGToB3.cpp:
(JSC::FTL::DFG::LowerDFGToB3::doubleToInt32):
(JSC::FTL::DFG::LowerDFGToB3::sensibleDoubleToInt32):
* ftl/FTLOutput.cpp:
(JSC::FTL::Output::hasSensibleDoubleToInt): Deleted.
* ftl/FTLOutput.h:
* runtime/MathCommon.cpp:
(JSC::operationToInt32SensibleSlow):
* runtime/MathCommon.h:
Modified Paths
Added Paths
Diff
Modified: trunk/JSTests/ChangeLog (211669 => 211670)
--- trunk/JSTests/ChangeLog 2017-02-04 08:25:10 UTC (rev 211669)
+++ trunk/JSTests/ChangeLog 2017-02-04 13:46:19 UTC (rev 211670)
@@ -1,3 +1,15 @@
+2017-02-04 Yusuke Suzuki <[email protected]>
+
+ [JSC] Add operationToInt32SensibleSlow to optimize kraken pbkdf2 and sha256
+ https://bugs.webkit.org/show_bug.cgi?id=167736
+
+ Reviewed by Saam Barati.
+
+ * stress/to-int32-sensible.js: Added.
+ (shouldBe):
+ (toInt32):
+ (test):
+
2017-02-01 Yusuke Suzuki <[email protected]>
Unreviewed, remove loop
Added: trunk/JSTests/stress/to-int32-sensible.js (0 => 211670)
--- trunk/JSTests/stress/to-int32-sensible.js (rev 0)
+++ trunk/JSTests/stress/to-int32-sensible.js 2017-02-04 13:46:19 UTC (rev 211670)
@@ -0,0 +1,40 @@
+function shouldBe(actual, expected) {
+ if (actual !== expected)
+ throw new Error('bad value: ' + actual);
+}
+
+// ValueToInt32(DoubleRep)
+function toInt32(number)
+{
+ return (number * 0.5) >> 0;
+}
+noInline(toInt32);
+for (var i = 0; i < 1e5; ++i)
+ toInt32(i * 1.0);
+
+function test(number)
+{
+ return toInt32(number * 2);
+}
+
+const INT32_MAX = 2147483647;
+const INT32_MIN = -2147483648;
+
+shouldBe(test(INT32_MAX - 1), INT32_MAX - 1);
+shouldBe(test(INT32_MAX - 0.5), INT32_MAX - 1);
+shouldBe(test(INT32_MAX), INT32_MAX);
+shouldBe(test(INT32_MAX + 0.5), INT32_MAX);
+shouldBe(test(INT32_MAX + 1), INT32_MIN);
+
+shouldBe(test(INT32_MIN - 1), INT32_MAX);
+shouldBe(test(INT32_MIN - 0.5), INT32_MIN);
+shouldBe(test(INT32_MIN), INT32_MIN);
+shouldBe(test(INT32_MIN + 0.5), INT32_MIN + 1);
+shouldBe(test(INT32_MIN + 1), INT32_MIN + 1);
+
+shouldBe(test(Number.EPSILON), 0);
+shouldBe(test(Number.NaN), 0);
+shouldBe(test(Number.POSITIVE_INFINITY), 0);
+shouldBe(test(Number.NEGATIVE_INFINITY), 0);
+shouldBe(test(Number.MAX_SAFE_INTEGER), -1);
+shouldBe(test(Number.MIN_SAFE_INTEGER), 1);
Modified: trunk/Source/_javascript_Core/ChangeLog (211669 => 211670)
--- trunk/Source/_javascript_Core/ChangeLog 2017-02-04 08:25:10 UTC (rev 211669)
+++ trunk/Source/_javascript_Core/ChangeLog 2017-02-04 13:46:19 UTC (rev 211670)
@@ -1,3 +1,114 @@
+2017-02-04 Yusuke Suzuki <[email protected]>
+
+ [JSC] Add operationToInt32SensibleSlow to optimize kraken pbkdf2 and sha256
+ https://bugs.webkit.org/show_bug.cgi?id=167736
+
+ Reviewed by Saam Barati.
+
+ Add a new function operationToInt32SensibleSlow. This function is only
+ called after x86 cvttss2si_rr is failed. This means that the
+ given double number never in range of int32 truncatable numbers.
+
+ As a result, exp in operationToInt32 always becomes >= 31. So
+ we can change the condition from `exp < 32` to `exp == 31`.
+ This makes missingOne constant. And it leads significantly good
+ code generation.
+
+ The original operationToInt32 code.
+
+ 170: 66 48 0f 7e c1 movq %xmm0,%rcx
+ 175: 31 c0 xor %eax,%eax
+ 177: 66 48 0f 7e c6 movq %xmm0,%rsi
+ 17c: 48 c1 f9 34 sar $0x34,%rcx
+ 180: 81 e1 ff 07 00 00 and $0x7ff,%ecx
+ 186: 8d 91 01 fc ff ff lea -0x3ff(%rcx),%edx
+ 18c: 83 fa 53 cmp $0x53,%edx
+ 18f: 77 37 ja 1c8 <_ZN3JSC16operationToInt32Ed+0x58>
+ 191: 83 fa 34 cmp $0x34,%edx
+ 194: 7f 3a jg 1d0 <_ZN3JSC16operationToInt32Ed+0x60>
+ 196: b9 34 00 00 00 mov $0x34,%ecx
+ 19b: 66 48 0f 7e c7 movq %xmm0,%rdi
+ 1a0: 29 d1 sub %edx,%ecx
+ 1a2: 48 d3 ff sar %cl,%rdi
+ 1a5: 83 fa 1f cmp $0x1f,%edx
+ 1a8: 89 f8 mov %edi,%eax
+ 1aa: 7f 12 jg 1be <_ZN3JSC16operationToInt32Ed+0x4e>
+ 1ac: 89 d1 mov %edx,%ecx
+ 1ae: b8 01 00 00 00 mov $0x1,%eax
+ 1b3: d3 e0 shl %cl,%eax
+ 1b5: 89 c2 mov %eax,%edx
+ 1b7: 8d 40 ff lea -0x1(%rax),%eax
+ 1ba: 21 f8 and %edi,%eax
+ 1bc: 01 d0 add %edx,%eax
+ 1be: 89 c2 mov %eax,%edx
+ 1c0: f7 da neg %edx
+ 1c2: 48 85 f6 test %rsi,%rsi
+ 1c5: 0f 48 c2 cmovs %edx,%eax
+ 1c8: f3 c3 repz retq
+ 1ca: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
+ 1d0: 66 48 0f 7e c0 movq %xmm0,%rax
+ 1d5: 81 e9 33 04 00 00 sub $0x433,%ecx
+ 1db: 48 d3 e0 shl %cl,%rax
+ 1de: eb de jmp 1be <_ZN3JSC16operationToInt32Ed+0x4e>
+
+ The operationToInt32SensibleSlow code.
+
+ 1e0: 66 48 0f 7e c1 movq %xmm0,%rcx
+ 1e5: 66 48 0f 7e c2 movq %xmm0,%rdx
+ 1ea: 48 c1 f9 34 sar $0x34,%rcx
+ 1ee: 81 e1 ff 07 00 00 and $0x7ff,%ecx
+ 1f4: 8d b1 01 fc ff ff lea -0x3ff(%rcx),%esi
+ 1fa: 83 fe 34 cmp $0x34,%esi
+ 1fd: 7e 21 jle 220 <_ZN3JSC28operationToInt32SensibleSlowEd+0x40>
+ 1ff: 66 48 0f 7e c0 movq %xmm0,%rax
+ 204: 81 e9 33 04 00 00 sub $0x433,%ecx
+ 20a: 48 d3 e0 shl %cl,%rax
+ 20d: 89 c1 mov %eax,%ecx
+ 20f: f7 d9 neg %ecx
+ 211: 48 85 d2 test %rdx,%rdx
+ 214: 0f 48 c1 cmovs %ecx,%eax
+ 217: c3 retq
+ 218: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
+ 21f: 00
+ 220: 66 48 0f 7e c0 movq %xmm0,%rax
+ 225: b9 34 00 00 00 mov $0x34,%ecx
+ 22a: 29 f1 sub %esi,%ecx
+ 22c: 48 d3 f8 sar %cl,%rax
+ 22f: 89 c1 mov %eax,%ecx
+ 231: 81 c9 00 00 00 80 or $0x80000000,%ecx
+ 237: 83 fe 1f cmp $0x1f,%esi
+ 23a: 0f 44 c1 cmove %ecx,%eax
+ 23d: 89 c1 mov %eax,%ecx
+ 23f: f7 d9 neg %ecx
+ 241: 48 85 d2 test %rdx,%rdx
+ 244: 0f 48 c1 cmovs %ecx,%eax
+ 247: c3 retq
+ 248: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
+ 24f: 00
+
+ This improves kraken pbkdf2 by 10.8% and sha256 by 7.5%.
+
+ baseline patched
+
+ stanford-crypto-pbkdf2 153.195+-2.745 ^ 138.204+-2.513 ^ definitely 1.1085x faster
+ stanford-crypto-sha256-iterative 49.047+-1.038 ^ 45.610+-1.235 ^ definitely 1.0754x faster
+
+ <arithmetic> 101.121+-1.379 ^ 91.907+-1.500 ^ definitely 1.1003x faster
+
+ * assembler/CPU.h:
+ (JSC::hasSensibleDoubleToInt):
+ * dfg/DFGSpeculativeJIT.cpp:
+ (JSC::DFG::SpeculativeJIT::compileValueToInt32):
+ * ftl/FTLLowerDFGToB3.cpp:
+ (JSC::FTL::DFG::LowerDFGToB3::doubleToInt32):
+ (JSC::FTL::DFG::LowerDFGToB3::sensibleDoubleToInt32):
+ * ftl/FTLOutput.cpp:
+ (JSC::FTL::Output::hasSensibleDoubleToInt): Deleted.
+ * ftl/FTLOutput.h:
+ * runtime/MathCommon.cpp:
+ (JSC::operationToInt32SensibleSlow):
+ * runtime/MathCommon.h:
+
2017-02-03 Joseph Pecoraro <[email protected]>
Unreviewed rollout of r211486, r211629.
Modified: trunk/Source/_javascript_Core/assembler/CPU.h (211669 => 211670)
--- trunk/Source/_javascript_Core/assembler/CPU.h 2017-02-04 08:25:10 UTC (rev 211669)
+++ trunk/Source/_javascript_Core/assembler/CPU.h 2017-02-04 13:46:19 UTC (rev 211670)
@@ -85,5 +85,10 @@
return isX86_64() && Options::useArchitectureSpecificOptimizations();
}
+inline bool hasSensibleDoubleToInt()
+{
+ return optimizeForX86();
+}
+
} // namespace JSC
Modified: trunk/Source/_javascript_Core/dfg/DFGSpeculativeJIT.cpp (211669 => 211670)
--- trunk/Source/_javascript_Core/dfg/DFGSpeculativeJIT.cpp 2017-02-04 08:25:10 UTC (rev 211669)
+++ trunk/Source/_javascript_Core/dfg/DFGSpeculativeJIT.cpp 2017-02-04 13:46:19 UTC (rev 211670)
@@ -2208,7 +2208,8 @@
GPRReg gpr = result.gpr();
JITCompiler::Jump notTruncatedToInteger = m_jit.branchTruncateDoubleToInt32(fpr, gpr, JITCompiler::BranchIfTruncateFailed);
- addSlowPathGenerator(slowPathCall(notTruncatedToInteger, this, operationToInt32, NeedToSpill, ExceptionCheckRequirement::CheckNotNeeded, gpr, fpr));
+ addSlowPathGenerator(slowPathCall(notTruncatedToInteger, this,
+ hasSensibleDoubleToInt() ? operationToInt32SensibleSlow : operationToInt32, NeedToSpill, ExceptionCheckRequirement::CheckNotNeeded, gpr, fpr));
int32Result(gpr, node);
return;
Modified: trunk/Source/_javascript_Core/ftl/FTLLowerDFGToB3.cpp (211669 => 211670)
--- trunk/Source/_javascript_Core/ftl/FTLLowerDFGToB3.cpp 2017-02-04 08:25:10 UTC (rev 211669)
+++ trunk/Source/_javascript_Core/ftl/FTLLowerDFGToB3.cpp 2017-02-04 13:46:19 UTC (rev 211670)
@@ -11591,7 +11591,7 @@
LValue doubleToInt32(LValue doubleValue)
{
- if (Output::hasSensibleDoubleToInt())
+ if (hasSensibleDoubleToInt())
return sensibleDoubleToInt32(doubleValue);
double limit = pow(2, 31) - 1;
@@ -11611,7 +11611,7 @@
LBasicBlock lastNext = m_out.appendTo(slowPath, continuation);
ValueFromBlock slowResult = m_out.anchor(
- m_out.call(Int32, m_out.operation(operationToInt32), doubleValue));
+ m_out.call(Int32, m_out.operation(operationToInt32SensibleSlow), doubleValue));
m_out.jump(continuation);
m_out.appendTo(continuation, lastNext);
Modified: trunk/Source/_javascript_Core/ftl/FTLOutput.cpp (211669 => 211670)
--- trunk/Source/_javascript_Core/ftl/FTLOutput.cpp 2017-02-04 08:25:10 UTC (rev 211669)
+++ trunk/Source/_javascript_Core/ftl/FTLOutput.cpp 2017-02-04 13:46:19 UTC (rev 211670)
@@ -333,11 +333,6 @@
return callWithoutSideEffects(B3::Double, logDouble, value);
}
-bool Output::hasSensibleDoubleToInt()
-{
- return optimizeForX86();
-}
-
LValue Output::doubleToInt(LValue value)
{
PatchpointValue* result = patchpoint(Int32);
Modified: trunk/Source/_javascript_Core/ftl/FTLOutput.h (211669 => 211670)
--- trunk/Source/_javascript_Core/ftl/FTLOutput.h 2017-02-04 08:25:10 UTC (rev 211669)
+++ trunk/Source/_javascript_Core/ftl/FTLOutput.h 2017-02-04 13:46:19 UTC (rev 211670)
@@ -193,7 +193,6 @@
LValue doubleLog(LValue);
- static bool hasSensibleDoubleToInt();
LValue doubleToInt(LValue);
LValue doubleToUInt(LValue);
Modified: trunk/Source/_javascript_Core/runtime/MathCommon.cpp (211669 => 211670)
--- trunk/Source/_javascript_Core/runtime/MathCommon.cpp 2017-02-04 08:25:10 UTC (rev 211669)
+++ trunk/Source/_javascript_Core/runtime/MathCommon.cpp 2017-02-04 13:46:19 UTC (rev 211670)
@@ -467,6 +467,71 @@
return JSC::toInt32(value);
}
+int32_t JIT_OPERATION operationToInt32SensibleSlow(double number)
+{
+ // This function is specialized `operationToInt32` for the slow case of
+ // the sensible double-to-int32 operation. It is available in x86.
+ //
+ // In the sensible double-to-int32, first we attempt to truncate the
+ // double value to int32 by using cvttsd2si_rr.
+ // According to the Intel's manual, cvttsd2si perform the following truncate
+ // operation.
+ //
+ // If src = "" +-Inf, or |(src)rz| > 0x7fffffff and (src)rz != 0x80000000,
+ // the result becomes 0x80000000. Otherwise, the operation succeeds.
+ // Note that ()rz is rouding towards zero.
+ //
+ // We call this slow case function when the above cvttsd2si fails. We check
+ // this condition by performing `result == 0x80000000`. So this function only
+ // accepts the following numbers.
+ //
+ // NaN, +-Inf, |(src)rz| > 0x7fffffff.
+ //
+ // As a result, the exp of the double is always >= 31.
+ // This condition simplifies and speeds up the toInt32 implementation.
+ int64_t bits = WTF::bitwise_cast<int64_t>(number);
+ int32_t exp = (static_cast<int32_t>(bits >> 52) & 0x7ff) - 0x3ff;
+
+ // If exponent < 0 there will be no bits to the left of the decimal point
+ // after rounding; if the exponent is > 83 then no bits of precision can be
+ // left in the low 32-bit range of the result (IEEE-754 doubles have 52 bits
+ // of fractional precision).
+ // Note this case handles 0, -0, and all infinite, NaN, & denormal value.
+
+ // If exp < 0, truncate operation succeeds. So this function does not
+ // encounter that case. If exp > 83, it means exp >= 84. In that case,
+ // the following operation produces 0 for the result.
+ ASSERT(exp >= 0);
+
+ // Select the appropriate 32-bits from the floating point mantissa. If the
+ // exponent is 52 then the bits we need to select are already aligned to the
+ // lowest bits of the 64-bit integer representation of the number, no need
+ // to shift. If the exponent is greater than 52 we need to shift the value
+ // left by (exp - 52), if the value is less than 52 we need to shift right
+ // accordingly.
+ int32_t result = (exp > 52)
+ ? static_cast<int32_t>(bits << (exp - 52))
+ : static_cast<int32_t>(bits >> (52 - exp));
+
+ // IEEE-754 double precision values are stored omitting an implicit 1 before
+ // the decimal point; we need to reinsert this now. We may also the shifted
+ // invalid bits into the result that are not a part of the mantissa (the sign
+ // and exponent bits from the floatingpoint representation); mask these out.
+ //
+ // The important observation is that exp is always >= 31. So the above case
+ // is needed to be cared only when the exp == 31.
+ ASSERT(exp >= 31);
+ if (exp == 31) {
+ int32_t missingOne = 1 << exp;
+ result &= (missingOne - 1);
+ result += missingOne;
+ }
+
+ // If the input value was negative (we could test either 'number' or 'bits',
+ // but testing 'bits' is likely faster) invert the result appropriately.
+ return bits < 0 ? -result : result;
+}
+
#if HAVE(ARM_IDIV_INSTRUCTIONS)
static inline bool isStrictInt32(double value)
{
Modified: trunk/Source/_javascript_Core/runtime/MathCommon.h (211669 => 211670)
--- trunk/Source/_javascript_Core/runtime/MathCommon.h 2017-02-04 08:25:10 UTC (rev 211669)
+++ trunk/Source/_javascript_Core/runtime/MathCommon.h 2017-02-04 13:46:19 UTC (rev 211670)
@@ -33,6 +33,7 @@
const int32_t maxExponentForIntegerMathPow = 1000;
double JIT_OPERATION operationMathPow(double x, double y) WTF_INTERNAL;
int32_t JIT_OPERATION operationToInt32(double) WTF_INTERNAL;
+int32_t JIT_OPERATION operationToInt32SensibleSlow(double) WTF_INTERNAL;
inline constexpr double maxSafeInteger()
{