Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v6]
On Mon, 20 Sep 2021 09:52:45 GMT, Andrew Dinn wrote: >> Andrew, can you help us to approve this? > > I agree with Andrew Haley that this patch is not going to make an improvement > for anything but a very small number of applications. Processing of strings > over a few 10s of bytes is rare. On the other hand the doesn't seem to cause > any performance drop for the much more common case of processing short > strings. so it does no harm. Also, the new and old code are much the same in > terms of complexity so that is no reason to prefer one over the other. The > only real concern I have is that any change involves the risk of error and > the ratio of cases that might benefit to cases that might suffer from an > error is very low. I don't think that's a reason to avoid pushing this patch > upstream but it does suggest that we should not backport it. OK, thanks. That seems like a sensible compromise. - PR: https://git.openjdk.java.net/jdk/pull/4722
Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v6]
On Fri, 17 Sep 2021 07:13:24 GMT, Wu Yan wrote: >> It's fine. I don't think it'll affect any real programs, so it's rather >> pointless. I don't know if that's any reason not to approve it. > > Andrew, can you help us to approve this? I agree with Andrew Haley that this patch is not going to make an improvement for anything but a very small number of applications. Processing of strings over a few 10s of bytes is rare. On the other hand the doesn't seem to cause any performance drop for the much more common case of processing short strings. so it does no harm. Also, the new and old code are much the same in terms of complexity so that is no reason to prefer one over the other. The only real concern I have is that any change involves the risk of error and the ratio of cases that might benefit to cases that might suffer from an error is very low. I don't think that's a reason to avoid pushing this patch upstream but it does suggest that we should not backport it. - PR: https://git.openjdk.java.net/jdk/pull/4722
Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v6]
On Sun, 5 Sep 2021 13:23:21 GMT, Andrew Haley wrote: >> Thanks, I'll fix it. > > It's fine. I don't think it'll affect any real programs, so it's rather > pointless. I don't know if that's any reason not to approve it. Andrew, can you help us to approve this? - PR: https://git.openjdk.java.net/jdk/pull/4722
Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v6]
On Thu, 26 Aug 2021 09:26:24 GMT, Wu Yan wrote: >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4871: >> >>> 4869: // exit from large loop when less than 64 bytes left to read or >>> we're about >>> 4870: // to prefetch memory behind array border >>> 4871: int largeLoopExitCondition = MAX(64, >>> SoftwarePrefetchHintDistance)/(isLL ? 1 : 2); >> >> This breaks the Windows AArch64 build: >> >> >> Creating support/modules_libs/java.base/server/jvm.dll from 1051 file(s) >> d:\a\jdk\jdk\jdk\src\hotspot\cpu\aarch64\stubGenerator_aarch64.cpp(4871): >> error C3861: 'MAX': identifier not found >> make[3]: *** [lib/CompileJvm.gmk:143: >> /cygdrive/d/a/jdk/jdk/jdk/build/windows-aarch64/hotspot/variant-server/libjvm >> >> >> https://github.com/Wanghuang-Huawei/jdk/runs/3260986937 >> >> Should probably be left as `MAX2`. > > Thanks, I'll fix it. It's fine. I don't think it'll affect any real programs, so it's rather pointless. I don't know if that's any reason not to approve it. - PR: https://git.openjdk.java.net/jdk/pull/4722
Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v6]
On Wed, 25 Aug 2021 07:40:56 GMT, Nick Gasson wrote: > I've run the benchmark on several different machines and didn't see any > performance regressions, and the speed-up for longer strings looks quite > good. I also ran jtreg tier1-3 with no new failures so I think this is ok. > > If you fix the Windows build I'll approve it. But please wait for another > review, preferably from @theRealAph. OK, Thank you very much! > Note that JDK-8269559 (#5129) is also adding a JMH benchmark for this > intrinsic: it would be good if we could merge them, either now or later. The JMH benchmark added by JDK-8269559 (#5129) can cover our test items (compareToLL and compareToUU), and can show the improvement of our patch, so we decided to delete our JMH benchmark in the next commit. The test results using that JMH benchmark in JDK-8269559 are as follows: Raspberry Pi 4B base: Benchmark (delta) (size) Mode Cnt Score Error Units StringCompareToDifferentLength.compareToLL2 24 avgt3 2.310 ? 0.050 ms/op StringCompareToDifferentLength.compareToLL2 36 avgt3 2.818 ? 0.185 ms/op StringCompareToDifferentLength.compareToLL2 72 avgt3 3.151 ? 0.215 ms/op StringCompareToDifferentLength.compareToLL2 128 avgt3 4.171 ? 1.320 ms/op StringCompareToDifferentLength.compareToLL2 256 avgt3 6.169 ? 0.653 ms/op StringCompareToDifferentLength.compareToLL2 512 avgt3 10.911 ? 0.175 ms/op StringCompareToDifferentLength.compareToLU2 24 avgt3 3.312 ? 0.102 ms/op StringCompareToDifferentLength.compareToLU2 36 avgt3 4.162 ? 0.032 ms/op StringCompareToDifferentLength.compareToLU2 72 avgt3 5.705 ? 0.152 ms/op StringCompareToDifferentLength.compareToLU2 128 avgt3 9.301 ? 0.749 ms/op StringCompareToDifferentLength.compareToLU2 256 avgt3 16.507 ? 1.353 ms/op StringCompareToDifferentLength.compareToLU2 512 avgt3 30.160 ? 0.377 ms/op StringCompareToDifferentLength.compareToUL2 24 avgt3 3.366 ? 0.280 ms/op StringCompareToDifferentLength.compareToUL2 36 avgt3 4.308 ? 0.037 ms/op StringCompareToDifferentLength.compareToUL2 72 avgt3 5.674 ? 0.210 ms/op StringCompareToDifferentLength.compareToUL2 128 avgt3 9.358 ? 0.158 ms/op StringCompareToDifferentLength.compareToUL2 256 avgt3 16.165 ? 0.158 ms/op StringCompareToDifferentLength.compareToUL2 512 avgt3 29.857 ? 0.277 ms/op StringCompareToDifferentLength.compareToUU2 24 avgt3 3.149 ? 0.209 ms/op StringCompareToDifferentLength.compareToUU2 36 avgt3 3.157 ? 0.102 ms/op StringCompareToDifferentLength.compareToUU2 72 avgt3 4.415 ? 0.073 ms/op StringCompareToDifferentLength.compareToUU2 128 avgt3 6.244 ? 0.224 ms/op StringCompareToDifferentLength.compareToUU2 256 avgt3 11.032 ? 0.080 ms/op StringCompareToDifferentLength.compareToUU2 512 avgt3 20.942 ? 3.973 ms/op opt: Benchmark (delta) (size) Mode Cnt Score Error Units StringCompareToDifferentLength.compareToLL2 24 avgt3 2.319 ? 0.121 ms/op StringCompareToDifferentLength.compareToLL2 36 avgt3 2.820 ? 0.096 ms/op StringCompareToDifferentLength.compareToLL2 72 avgt3 2.511 ? 0.024 ms/op StringCompareToDifferentLength.compareToLL2 128 avgt3 3.496 ? 0.382 ms/op StringCompareToDifferentLength.compareToLL2 256 avgt3 5.215 ? 0.210 ms/op StringCompareToDifferentLength.compareToLL2 512 avgt3 7.772 ? 0.448 ms/op StringCompareToDifferentLength.compareToLU2 24 avgt3 3.432 ? 0.249 ms/op StringCompareToDifferentLength.compareToLU2 36 avgt3 4.156 ? 0.052 ms/op StringCompareToDifferentLength.compareToLU2 72 avgt3 5.735 ? 0.043 ms/op StringCompareToDifferentLength.compareToLU2 128 avgt3 9.215 ? 0.394 ms/op StringCompareToDifferentLength.compareToLU2 256 avgt3 16.373 ? 0.515 ms/op StringCompareToDifferentLength.compareToLU2 512 avgt3 29.906 ? 0.245 ms/op StringCompareToDifferentLength.compareToUL2 24 avgt3 3.361 ? 0.116 ms/op StringCompareToDifferentLength.compareToUL2 36 avgt3 4.253 ? 0.061 ms/op StringCompareToDifferentLength.compareToUL2 72 avgt3 5.744 ? 0.082 ms/op StringCompareToDifferentLength.compareToUL2 128 avgt3 9.167 ? 0.343 ms/op StringCompareToDifferentLength.compareToUL2 256 avgt3 16.591 ? 0.999 ms/op
Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v6]
On Fri, 6 Aug 2021 09:50:54 GMT, Wang Huang wrote: >> Dear all, >> Can you do me a favor to review this patch. This patch use `ldp` to >> implement String.compareTo. >> >> * We add a JMH test case >> * Here is the result of this test case >> >> Benchmark |(size)| Mode| Cnt|Score | Error |Units >> -|--|-||--||- >> StringCompare.compareLL | 64 | avgt| 5 |7.992 | ± 0.005|us/op >> StringCompare.compareLL | 72 | avgt| 5 |15.029| ± 0.006|us/op >> StringCompare.compareLL | 80 | avgt| 5 |14.655| ± 0.011|us/op >> StringCompare.compareLL | 91 | avgt| 5 |16.363| ± 0.12 |us/op >> StringCompare.compareLL | 101 | avgt| 5 |16.966| ± 0.007|us/op >> StringCompare.compareLL | 121 | avgt| 5 |19.276| ± 0.006|us/op >> StringCompare.compareLL | 181 | avgt| 5 |19.002| ± 0.417|us/op >> StringCompare.compareLL | 256 | avgt| 5 |24.707| ± 0.041|us/op >> StringCompare.compareLLWithLdp| 64 | avgt| 5 |8.001 | ± >> 0.121|us/op >> StringCompare.compareLLWithLdp| 72 | avgt| 5 |11.573| ± 0.003|us/op >> StringCompare.compareLLWithLdp| 80 | avgt| 5 |6.861 | ± 0.004|us/op >> StringCompare.compareLLWithLdp| 91 | avgt| 5 |12.774| ± 0.201|us/op >> StringCompare.compareLLWithLdp| 101 | avgt| 5 |8.691 | ± 0.004|us/op >> StringCompare.compareLLWithLdp| 121 | avgt| 5 |11.091| ± 1.342|us/op >> StringCompare.compareLLWithLdp| 181 | avgt| 5 |14.64 | ± 0.581|us/op >> StringCompare.compareLLWithLdp| 256 | avgt| 5 |25.879| ± 1.775|us/op >> StringCompare.compareUU | 64 | avgt| 5 |13.476| ± 0.01 |us/op >> StringCompare.compareUU | 72 | avgt| 5 |15.078| ± 0.006|us/op >> StringCompare.compareUU | 80 | avgt| 5 |23.512| ± 0.011|us/op >> StringCompare.compareUU | 91 | avgt| 5 |24.284| ± 0.008|us/op >> StringCompare.compareUU | 101 | avgt| 5 |20.707| ± 0.017|us/op >> StringCompare.compareUU | 121 | avgt| 5 |29.302| ± 0.011|us/op >> StringCompare.compareUU | 181 | avgt| 5 |39.31 | ± >> 0.016|us/op >> StringCompare.compareUU | 256 | avgt| 5 |54.592| ± 0.392|us/op >> StringCompare.compareUUWithLdp| 64 | avgt| 5 |16.389| ± 0.008|us/op >> StringCompare.compareUUWithLdp| 72 | avgt| 5 |10.71 | ± 0.158|us/op >> StringCompare.compareUUWithLdp| 80 | avgt| 5 |11.488| ± 0.024|us/op >> StringCompare.compareUUWithLdp| 91 | avgt| 5 |13.412| ± 0.006|us/op >> StringCompare.compareUUWithLdp| 101 | avgt| 5 |16.245| ± 0.434|us/op >> StringCompare.compareUUWithLdp| 121 | avgt| 5 |16.597| ± 0.016|us/op >> StringCompare.compareUUWithLdp| 181 | avgt| 5 |27.373| ± 0.017|us/op >> StringCompare.compareUUWithLdp| 256 | avgt| 5 |41.74 | ± 3.5 |us/op >> >> From this table, we can see that in most cases, our patch is better than old >> one. >> >> Thank you for your review. Any suggestions are welcome. > > Wang Huang has updated the pull request incrementally with one additional > commit since the last revision: > > fix codestyle I've run the benchmark on several different machines and didn't see any performance regressions, and the speed-up for longer strings looks quite good. I also ran jtreg tier1-3 with no new failures so I think this is ok. If you fix the Windows build I'll approve it. But please wait for another review, preferably from @theRealAph. Note that JDK-8269559 (#5129) is also adding a JMH benchmark for this intrinsic: it would be good if we could merge them, either now or later. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4871: > 4869: // exit from large loop when less than 64 bytes left to read or > we're about > 4870: // to prefetch memory behind array border > 4871: int largeLoopExitCondition = MAX(64, > SoftwarePrefetchHintDistance)/(isLL ? 1 : 2); This breaks the Windows AArch64 build: Creating support/modules_libs/java.base/server/jvm.dll from 1051 file(s) d:\a\jdk\jdk\jdk\src\hotspot\cpu\aarch64\stubGenerator_aarch64.cpp(4871): error C3861: 'MAX': identifier not found make[3]: *** [lib/CompileJvm.gmk:143: /cygdrive/d/a/jdk/jdk/jdk/build/windows-aarch64/hotspot/variant-server/libjvm https://github.com/Wanghuang-Huawei/jdk/runs/3260986937 Should probably be left as `MAX2`. - Changes requested by ngasson (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4722
Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v6]
> Dear all, > Can you do me a favor to review this patch. This patch use `ldp` to > implement String.compareTo. > > * We add a JMH test case > * Here is the result of this test case > > Benchmark|(size)| Mode| Cnt|Score | Error |Units > -|--|-||--||- > StringCompare.compareLL | 64 | avgt| 5 |7.992 | ±0.005|us/op > StringCompare.compareLL | 72 | avgt| 5 |15.029| ±0.006|us/op > StringCompare.compareLL | 80 | avgt| 5 |14.655| ±0.011|us/op > StringCompare.compareLL | 91 | avgt| 5 |16.363| ±0.12 |us/op > StringCompare.compareLL | 101 | avgt| 5 |16.966| ±0.007|us/op > StringCompare.compareLL | 121 | avgt| 5 |19.276| ±0.006|us/op > StringCompare.compareLL | 181 | avgt| 5 |19.002| ±0.417|us/op > StringCompare.compareLL | 256 | avgt| 5 |24.707| ±0.041|us/op > StringCompare.compareLLWithLdp| 64 | avgt| 5 |8.001| ± > 0.121|us/op > StringCompare.compareLLWithLdp| 72 | avgt| 5 |11.573| ±0.003|us/op > StringCompare.compareLLWithLdp| 80 | avgt| 5 |6.861 | ±0.004|us/op > StringCompare.compareLLWithLdp| 91 | avgt| 5 |12.774| ±0.201|us/op > StringCompare.compareLLWithLdp| 101 | avgt| 5 |8.691 | ±0.004|us/op > StringCompare.compareLLWithLdp| 121 | avgt| 5 |11.091| ±1.342|us/op > StringCompare.compareLLWithLdp| 181 | avgt| 5 |14.64 | ±0.581|us/op > StringCompare.compareLLWithLdp| 256 | avgt| 5 |25.879| ±1.775|us/op > StringCompare.compareUU | 64 | avgt| 5 |13.476| ±0.01 |us/op > StringCompare.compareUU | 72 | avgt| 5 |15.078| ±0.006|us/op > StringCompare.compareUU | 80 | avgt| 5 |23.512| ±0.011|us/op > StringCompare.compareUU | 91 | avgt| 5 |24.284| ±0.008|us/op > StringCompare.compareUU | 101 | avgt| 5 |20.707| ±0.017|us/op > StringCompare.compareUU | 121 | avgt| 5 |29.302| ±0.011|us/op > StringCompare.compareUU | 181 | avgt| 5 |39.31| ± > 0.016|us/op > StringCompare.compareUU | 256 | avgt| 5 |54.592| ±0.392|us/op > StringCompare.compareUUWithLdp| 64 | avgt| 5 |16.389| ±0.008|us/op > StringCompare.compareUUWithLdp| 72 | avgt| 5 |10.71 | ±0.158|us/op > StringCompare.compareUUWithLdp| 80 | avgt| 5 |11.488| ±0.024|us/op > StringCompare.compareUUWithLdp| 91 | avgt| 5 |13.412| ±0.006|us/op > StringCompare.compareUUWithLdp| 101 | avgt| 5 |16.245| ±0.434|us/op > StringCompare.compareUUWithLdp| 121 | avgt| 5 |16.597| ±0.016|us/op > StringCompare.compareUUWithLdp| 181 | avgt| 5 |27.373| ±0.017|us/op > StringCompare.compareUUWithLdp| 256 | avgt| 5 |41.74 | ±3.5 |us/op > > From this table, we can see that in most cases, our patch is better than old > one. > > Thank you for your review. Any suggestions are welcome. Wang Huang has updated the pull request incrementally with one additional commit since the last revision: fix codestyle - Changes: - all: https://git.openjdk.java.net/jdk/pull/4722/files - new: https://git.openjdk.java.net/jdk/pull/4722/files/60dd0516..2f756261 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=4722=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk=4722=04-05 Stats: 9 lines in 1 file changed: 0 ins; 1 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/4722.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4722/head:pull/4722 PR: https://git.openjdk.java.net/jdk/pull/4722