To do `acmp`, if the pointer comparison fails, we check nullity of operands, whether operands have the same class, and if this class is a value class. And if everything works, we call `isSubstitutable`.
Currently, when we have profiling information, we use it to speculate that operands are null or of an identity class (in which case, pointer comparison is all we should do). This patch proposes to take advantage of the profiling when it hints that operands are value objects. By speculating that at least one operand is of a given value class, the call to `isSubstitutable` can be intrinsified later, and thus, spare the Java call. Speculating on one operand is enough since we will check that operands are of the same type. Some relevant microbenchmark in `valhalla.acmp.array.Value032` (in ns/op (lower is better)): | | branch_obj_equals000 | branch_val_equals000 | branch_obj_equals025 | branch_val_equals025 | |---------------|----------------------|----------------------|----------------------|----------------------| | Before | 4.361 ± 0.153 ns/op | 0.907 ± 0.015 ns/op | 3.332 ± 0.040 ns/op | 0.845 ± 0.032 ns/op | | With this fix | 0.911 ± 0.019 ns/op | 0.899 ± 0.021 ns/op | 0.838 ± 0.023 ns/op | 0.806 ± 0.012 ns/op | <details> <summary>Full results of <tt>micro:valhalla.acmp.array.Value032</tt></summary> In more details, before: Benchmark Mode Cnt Score Error Units Value032.branch_obj_equals000 avgt 15 4.087 ± 0.122 ns/op Value032.branch_obj_equals025 avgt 15 3.117 ± 0.067 ns/op Value032.branch_obj_equals050 avgt 15 2.235 ± 0.083 ns/op Value032.branch_obj_equals075 avgt 15 1.292 ± 0.077 ns/op Value032.branch_obj_equals100 avgt 15 0.316 ± 0.004 ns/op Value032.branch_val_equals000 avgt 15 0.891 ± 0.024 ns/op Value032.branch_val_equals025 avgt 15 0.839 ± 0.033 ns/op Value032.branch_val_equals050 avgt 15 0.869 ± 0.087 ns/op Value032.branch_val_equals075 avgt 15 0.816 ± 0.023 ns/op Value032.branch_val_equals100 avgt 15 0.906 ± 0.032 ns/op Value032.result_obj_equals000 avgt 15 4.115 ± 0.185 ns/op Value032.result_obj_equals025 avgt 15 3.326 ± 0.346 ns/op Value032.result_obj_equals050 avgt 15 2.353 ± 0.516 ns/op Value032.result_obj_equals075 avgt 15 1.226 ± 0.040 ns/op Value032.result_obj_equals100 avgt 15 0.326 ± 0.006 ns/op Value032.result_val_equals000 avgt 15 0.888 ± 0.025 ns/op Value032.result_val_equals025 avgt 15 0.824 ± 0.028 ns/op Value032.result_val_equals050 avgt 15 0.891 ± 0.039 ns/op Value032.result_val_equals075 avgt 15 0.892 ± 0.005 ns/op Value032.result_val_equals100 avgt 15 0.908 ± 0.021 ns/op Value032NullFree.branch_val_equals000 avgt 15 0.186 ± 0.012 ns/op Value032NullFree.branch_val_equals025 avgt 15 0.461 ± 0.012 ns/op Value032NullFree.branch_val_equals050 avgt 15 0.468 ± 0.015 ns/op Value032NullFree.branch_val_equals075 avgt 15 0.465 ± 0.014 ns/op Value032NullFree.branch_val_equals100 avgt 15 0.189 ± 0.007 ns/op Value032NullFree.result_val_equals000 avgt 15 0.184 ± 0.004 ns/op Value032NullFree.result_val_equals025 avgt 15 0.302 ± 0.005 ns/op Value032NullFree.result_val_equals050 avgt 15 0.448 ± 0.014 ns/op Value032NullFree.result_val_equals075 avgt 15 0.577 ± 0.014 ns/op Value032NullFree.result_val_equals100 avgt 15 0.183 ± 0.004 ns/op Value032NullFreeNonAtomic.branch_val_equals000 avgt 15 0.182 ± 0.006 ns/op Value032NullFreeNonAtomic.branch_val_equals025 avgt 15 0.470 ± 0.015 ns/op Value032NullFreeNonAtomic.branch_val_equals050 avgt 15 0.474 ± 0.016 ns/op Value032NullFreeNonAtomic.branch_val_equals075 avgt 15 0.468 ± 0.017 ns/op Value032NullFreeNonAtomic.branch_val_equals100 avgt 15 0.188 ± 0.009 ns/op Value032NullFreeNonAtomic.result_val_equals000 avgt 15 0.190 ± 0.007 ns/op Value032NullFreeNonAtomic.result_val_equals025 avgt 15 0.322 ± 0.017 ns/op Value032NullFreeNonAtomic.result_val_equals050 avgt 15 0.473 ± 0.020 ns/op Value032NullFreeNonAtomic.result_val_equals075 avgt 15 0.592 ± 0.026 ns/op Value032NullFreeNonAtomic.result_val_equals100 avgt 15 0.189 ± 0.017 ns/op After: Benchmark Mode Cnt Score Error Units Value032.branch_obj_equals000 avgt 15 0.967 ± 0.039 ns/op Value032.branch_obj_equals025 avgt 15 0.890 ± 0.032 ns/op Value032.branch_obj_equals050 avgt 15 0.640 ± 0.019 ns/op Value032.branch_obj_equals075 avgt 15 0.466 ± 0.022 ns/op Value032.branch_obj_equals100 avgt 15 0.320 ± 0.006 ns/op Value032.branch_val_equals000 avgt 15 0.913 ± 0.025 ns/op Value032.branch_val_equals025 avgt 15 0.843 ± 0.029 ns/op Value032.branch_val_equals050 avgt 15 0.818 ± 0.019 ns/op Value032.branch_val_equals075 avgt 15 0.850 ± 0.018 ns/op Value032.branch_val_equals100 avgt 15 0.893 ± 0.018 ns/op Value032.result_obj_equals000 avgt 15 0.927 ± 0.034 ns/op Value032.result_obj_equals025 avgt 15 0.865 ± 0.031 ns/op Value032.result_obj_equals050 avgt 15 0.632 ± 0.024 ns/op Value032.result_obj_equals075 avgt 15 0.470 ± 0.015 ns/op Value032.result_obj_equals100 avgt 15 0.320 ± 0.007 ns/op Value032.result_val_equals000 avgt 15 0.889 ± 0.021 ns/op Value032.result_val_equals025 avgt 15 0.811 ± 0.023 ns/op Value032.result_val_equals050 avgt 15 0.890 ± 0.025 ns/op Value032.result_val_equals075 avgt 15 0.923 ± 0.036 ns/op Value032.result_val_equals100 avgt 15 0.896 ± 0.027 ns/op Value032NullFree.branch_val_equals000 avgt 15 0.184 ± 0.006 ns/op Value032NullFree.branch_val_equals025 avgt 15 0.462 ± 0.015 ns/op Value032NullFree.branch_val_equals050 avgt 15 0.460 ± 0.027 ns/op Value032NullFree.branch_val_equals075 avgt 15 0.446 ± 0.014 ns/op Value032NullFree.branch_val_equals100 avgt 15 0.183 ± 0.006 ns/op Value032NullFree.result_val_equals000 avgt 15 0.188 ± 0.006 ns/op Value032NullFree.result_val_equals025 avgt 15 0.317 ± 0.009 ns/op Value032NullFree.result_val_equals050 avgt 15 0.461 ± 0.012 ns/op Value032NullFree.result_val_equals075 avgt 15 0.584 ± 0.017 ns/op Value032NullFree.result_val_equals100 avgt 15 0.181 ± 0.005 ns/op Value032NullFreeNonAtomic.branch_val_equals000 avgt 15 0.184 ± 0.006 ns/op Value032NullFreeNonAtomic.branch_val_equals025 avgt 15 0.465 ± 0.019 ns/op Value032NullFreeNonAtomic.branch_val_equals050 avgt 15 0.451 ± 0.018 ns/op Value032NullFreeNonAtomic.branch_val_equals075 avgt 15 0.468 ± 0.017 ns/op Value032NullFreeNonAtomic.branch_val_equals100 avgt 15 0.187 ± 0.009 ns/op Value032NullFreeNonAtomic.result_val_equals000 avgt 15 0.184 ± 0.007 ns/op Value032NullFreeNonAtomic.result_val_equals025 avgt 15 0.315 ± 0.011 ns/op Value032NullFreeNonAtomic.result_val_equals050 avgt 15 0.464 ± 0.021 ns/op Value032NullFreeNonAtomic.result_val_equals075 avgt 15 0.581 ± 0.018 ns/op Value032NullFreeNonAtomic.result_val_equals100 avgt 15 0.179 ± 0.002 ns/op It is same or better. </details> Tested successfully with tier1-4,stress,valhalla-stress Thanks, Marc ------------- Commit messages: - Remove printing - Safer? - Maybe - Show stopped - Not all the time... - Tell me your secrets! - Tell me your secrets! - Fix segfault - Refactored... - Hacky prototpype Changes: https://git.openjdk.org/valhalla/pull/2237/files Webrev: https://webrevs.openjdk.org/?repo=valhalla&pr=2237&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8376544 Stats: 165 lines in 4 files changed: 94 ins; 43 del; 28 mod Patch: https://git.openjdk.org/valhalla/pull/2237.diff Fetch: git fetch https://git.openjdk.org/valhalla.git pull/2237/head:pull/2237 PR: https://git.openjdk.org/valhalla/pull/2237
