Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v4]

2021-09-03 Thread Wu Yan
On Wed, 28 Jul 2021 08:51:38 GMT, Andrew Haley  wrote:

>> I don't think we want to keep two copies of the compareTo intrinsic. If 
>> there are no cases where the LDP version is worse than the original version 
>> then we should just delete the old one and replace it with this.
>
>> I don't think we want to keep two copies of the compareTo intrinsic. If 
>> there are no cases where the LDP version is worse than the original version 
>> then we should just delete the old one and replace it with this.
> 
> I agree. The trouble is, what does "worse" mean? I'm looking at SDEN-1982442, 
> Neoverse N2 errata, 2001293, and I see that LDP has to be slowed down on 
> streaming workloads, which will affect this. (That's just an example: I'm 
> making the point that implementations differ.)
> 
> The trouble with this patch is that it (probably) makes things better for 
> long strings, which are very rare. What we actually need to care about is 
> performance for a large number of typical-sized strings, which are names, 
> identifiers, passwords, and so on: about 10-30 characters.

@theRealAph do you have any other questions about this patch?

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v4]

2021-08-23 Thread Wu Yan
On Wed, 28 Jul 2021 08:51:38 GMT, Andrew Haley  wrote:

>> I don't think we want to keep two copies of the compareTo intrinsic. If 
>> there are no cases where the LDP version is worse than the original version 
>> then we should just delete the old one and replace it with this.
>
>> I don't think we want to keep two copies of the compareTo intrinsic. If 
>> there are no cases where the LDP version is worse than the original version 
>> then we should just delete the old one and replace it with this.
> 
> I agree. The trouble is, what does "worse" mean? I'm looking at SDEN-1982442, 
> Neoverse N2 errata, 2001293, and I see that LDP has to be slowed down on 
> streaming workloads, which will affect this. (That's just an example: I'm 
> making the point that implementations differ.)
> 
> The trouble with this patch is that it (probably) makes things better for 
> long strings, which are very rare. What we actually need to care about is 
> performance for a large number of typical-sized strings, which are names, 
> identifiers, passwords, and so on: about 10-30 characters.

Hi, @theRealAph @nick-arm, The test data looks OK on Raspberry Pi 4B and 
Hisilicon, do you have any other questions about this patch?

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v4]

2021-08-03 Thread Wang Huang
On Thu, 15 Jul 2021 03:30:46 GMT, Wang Huang  wrote:

>> Dear all, 
>> Can you do me a favor to review this patch. This patch use `ldp` to 
>> implement String.compareTo.
>>
>> * We add a JMH test case 
>> * Here is the result of this test case
>>  
>> Benchmark   |(size)| Mode| Cnt|Score | Error  |Units 
>> -|--|-||--||-
>> StringCompare.compareLL   |  64  | avgt| 5  |7.992 | ±   0.005|us/op
>> StringCompare.compareLL   |  72  | avgt| 5  |15.029| ±   0.006|us/op
>> StringCompare.compareLL   |  80  | avgt| 5  |14.655| ±   0.011|us/op
>> StringCompare.compareLL   |  91  | avgt| 5  |16.363| ±   0.12 |us/op
>> StringCompare.compareLL   |  101 | avgt| 5  |16.966| ±   0.007|us/op
>> StringCompare.compareLL   |  121 | avgt| 5  |19.276| ±   0.006|us/op
>> StringCompare.compareLL   |  181 | avgt| 5  |19.002| ±   0.417|us/op
>> StringCompare.compareLL   |  256 | avgt| 5  |24.707| ±   0.041|us/op
>> StringCompare.compareLLWithLdp|  64  | avgt| 5  |8.001   | ± 
>> 0.121|us/op
>> StringCompare.compareLLWithLdp|  72  | avgt| 5  |11.573| ±   0.003|us/op
>> StringCompare.compareLLWithLdp|  80  | avgt| 5  |6.861 | ±   0.004|us/op
>> StringCompare.compareLLWithLdp|  91  | avgt| 5  |12.774| ±   0.201|us/op
>> StringCompare.compareLLWithLdp|  101 | avgt| 5  |8.691 | ±   0.004|us/op
>> StringCompare.compareLLWithLdp|  121 | avgt| 5  |11.091| ±   1.342|us/op
>> StringCompare.compareLLWithLdp|  181 | avgt| 5  |14.64 | ±   0.581|us/op
>> StringCompare.compareLLWithLdp|  256 | avgt| 5  |25.879| ±   1.775|us/op
>> StringCompare.compareUU   |  64  | avgt| 5  |13.476| ±   0.01 |us/op
>> StringCompare.compareUU   |  72  | avgt| 5  |15.078| ±   0.006|us/op
>> StringCompare.compareUU   |  80  | avgt| 5  |23.512| ±   0.011|us/op
>> StringCompare.compareUU   |  91  | avgt| 5  |24.284| ±   0.008|us/op
>> StringCompare.compareUU   |  101 | avgt| 5  |20.707| ±   0.017|us/op
>> StringCompare.compareUU   |  121 | avgt| 5  |29.302| ±   0.011|us/op
>> StringCompare.compareUU   |  181 | avgt| 5  |39.31   | ± 
>> 0.016|us/op
>> StringCompare.compareUU   |  256 | avgt| 5  |54.592| ±   0.392|us/op
>> StringCompare.compareUUWithLdp|  64  | avgt| 5  |16.389| ±   0.008|us/op
>> StringCompare.compareUUWithLdp|  72  | avgt| 5  |10.71 | ±   0.158|us/op
>> StringCompare.compareUUWithLdp|  80  | avgt| 5  |11.488| ±   0.024|us/op
>> StringCompare.compareUUWithLdp|  91  | avgt| 5  |13.412| ±   0.006|us/op
>> StringCompare.compareUUWithLdp|  101 | avgt| 5  |16.245| ±   0.434|us/op
>> StringCompare.compareUUWithLdp|  121 | avgt| 5  |16.597| ±   0.016|us/op
>> StringCompare.compareUUWithLdp|  181 | avgt| 5  |27.373| ±   0.017|us/op
>> StringCompare.compareUUWithLdp|  256 | avgt| 5  |41.74 | ±   3.5  |us/op
>> 
>> From this table, we can see that in most cases, our patch is better than old 
>> one.
>> 
>> Thank you for your review. Any suggestions are welcome.
>
> Wang Huang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   fix style and add unalign test case

Thank you for your suggestion. I have pushed new commit.

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v4]

2021-07-30 Thread Andrew Haley
On 7/30/21 7:49 AM, Wu Yan wrote:

> I aggree. This is the compromise solution that the optimization
> has no effect (or even slowdown) on some platforms.
> In addition, I found that in
> [JDK-8202326](https://bugs.openjdk.java.net/browse/JDK-8202326),
> adding prefetches is only for long strings (the rare cases),
> maybe we can further optimize longs string with LDP. So should
> I continue this optimization or close it.

IMO, we don't want to be using the vector unit unless it does
some good, and if you can do this sort of thing in the CPU core
you should, so I like that. I was (still am) tempted to approve
it, but Nick says there are still bugs in corner cases.

I think you should probably close it. Comparison of really long
Strings is so rare that I can't find any examples of where it
actually happens. Array comparisons, sure, but Strings, not so
much.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. 
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671



Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v4]

2021-07-30 Thread Wu Yan
On Wed, 28 Jul 2021 09:55:18 GMT, Nick Gasson  wrote:

> Adding prefetches was one of the reasons to introduce the separate stub for 
> long strings, see the mail below:
> 
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-April/02.html


Thank you for pointing this out, we didn't find that adding prefetches was one 
of the reasons for that optimization before.  

> Did you find there's no benefit to that?

In fact, at first we tested and found that adding prefetch would make it worse 
in some cases, so we removed prefetch in the LDP version, but after more 
testing, we found that prefetch is not the cause of the performance 
degradation. Sorry for this, please ignore the prefetch problem,  I will add 
prefetch back next.


> We don't really want to have different implementations for each 
> microarchitecture, that would be a testing nightmare.

I aggree. This is the compromise solution that the optimization has no effect 
(or even slowdown) on some platforms. 
In addition, I found that in 
[JDK-8202326](https://bugs.openjdk.java.net/browse/JDK-8202326), adding 
prefetches is only for long strings (the rare cases), maybe we can further 
optimize longs string with LDP. So should I continue this optimization or close 
it.

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v4]

2021-07-28 Thread Nick Gasson
On Wed, 28 Jul 2021 09:29:25 GMT, Wu Yan  wrote:

> 
> We are testing on HiSilicon TSV110, maybe we can enable this optimization by 
> default on the verified platforms.

We don't really want to have different implementations for each 
microarchitecture, that would be a testing nightmare. 

The existing stub uses prefetch instructions if `SoftwarePrefetchHintDistance 
>= 0` but the new LDP version doesn't. Did you find there's no benefit to that? 
Adding prefetches was one of the reasons to introduce the separate stub for 
long strings, see the mail below:

https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-April/02.html

It seems the existing code was tuned for Thunder X/X2 so perhaps that's why 
Andrew sees little improvement there with the new version.

What testing have you done besides benchmarking? The patch linked above had at 
least two subtle bugs in corner cases.

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v4]

2021-07-28 Thread Wu Yan
On Wed, 28 Jul 2021 08:51:38 GMT, Andrew Haley  wrote:

> The trouble is, what does "worse" mean? I'm looking at SDEN-1982442, Neoverse 
> N2 errata, 2001293, and I see that LDP has to be slowed down on streaming 
> workloads, which will affect this. (That's just an example: I'm making the 
> point that implementations differ.)

We are testing on HiSilicon TSV110, maybe we can enable this optimization by 
default on the verified platforms.

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v4]

2021-07-28 Thread Andrew Haley
On Wed, 28 Jul 2021 08:25:08 GMT, Nick Gasson  wrote:

> I don't think we want to keep two copies of the compareTo intrinsic. If there 
> are no cases where the LDP version is worse than the original version then we 
> should just delete the old one and replace it with this.

I agree. The trouble is, what does "worse" mean? I'm looking at SDEN-1982442, 
Neoverse N2 errata, 2001293, and I see that LDP has to be slowed down on 
streaming workloads, which will affect this.

The trouble with this patch is that it (probably) makes things better for long 
strings, which are very rare. What we actually need to care about is 
performance for a large number of typical-sized strings, which are names, 
identifiers, passwords, and so on: about 10-30 characters.

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v4]

2021-07-28 Thread Nick Gasson
On Thu, 15 Jul 2021 03:30:46 GMT, Wang Huang  wrote:

>> Dear all, 
>> Can you do me a favor to review this patch. This patch use `ldp` to 
>> implement String.compareTo.
>>
>> * We add a JMH test case 
>> * Here is the result of this test case
>>  
>> Benchmark   |(size)| Mode| Cnt|Score | Error  |Units 
>> -|--|-||--||-
>> StringCompare.compareLL   |  64  | avgt| 5  |7.992 | ±   0.005|us/op
>> StringCompare.compareLL   |  72  | avgt| 5  |15.029| ±   0.006|us/op
>> StringCompare.compareLL   |  80  | avgt| 5  |14.655| ±   0.011|us/op
>> StringCompare.compareLL   |  91  | avgt| 5  |16.363| ±   0.12 |us/op
>> StringCompare.compareLL   |  101 | avgt| 5  |16.966| ±   0.007|us/op
>> StringCompare.compareLL   |  121 | avgt| 5  |19.276| ±   0.006|us/op
>> StringCompare.compareLL   |  181 | avgt| 5  |19.002| ±   0.417|us/op
>> StringCompare.compareLL   |  256 | avgt| 5  |24.707| ±   0.041|us/op
>> StringCompare.compareLLWithLdp|  64  | avgt| 5  |8.001   | ± 
>> 0.121|us/op
>> StringCompare.compareLLWithLdp|  72  | avgt| 5  |11.573| ±   0.003|us/op
>> StringCompare.compareLLWithLdp|  80  | avgt| 5  |6.861 | ±   0.004|us/op
>> StringCompare.compareLLWithLdp|  91  | avgt| 5  |12.774| ±   0.201|us/op
>> StringCompare.compareLLWithLdp|  101 | avgt| 5  |8.691 | ±   0.004|us/op
>> StringCompare.compareLLWithLdp|  121 | avgt| 5  |11.091| ±   1.342|us/op
>> StringCompare.compareLLWithLdp|  181 | avgt| 5  |14.64 | ±   0.581|us/op
>> StringCompare.compareLLWithLdp|  256 | avgt| 5  |25.879| ±   1.775|us/op
>> StringCompare.compareUU   |  64  | avgt| 5  |13.476| ±   0.01 |us/op
>> StringCompare.compareUU   |  72  | avgt| 5  |15.078| ±   0.006|us/op
>> StringCompare.compareUU   |  80  | avgt| 5  |23.512| ±   0.011|us/op
>> StringCompare.compareUU   |  91  | avgt| 5  |24.284| ±   0.008|us/op
>> StringCompare.compareUU   |  101 | avgt| 5  |20.707| ±   0.017|us/op
>> StringCompare.compareUU   |  121 | avgt| 5  |29.302| ±   0.011|us/op
>> StringCompare.compareUU   |  181 | avgt| 5  |39.31   | ± 
>> 0.016|us/op
>> StringCompare.compareUU   |  256 | avgt| 5  |54.592| ±   0.392|us/op
>> StringCompare.compareUUWithLdp|  64  | avgt| 5  |16.389| ±   0.008|us/op
>> StringCompare.compareUUWithLdp|  72  | avgt| 5  |10.71 | ±   0.158|us/op
>> StringCompare.compareUUWithLdp|  80  | avgt| 5  |11.488| ±   0.024|us/op
>> StringCompare.compareUUWithLdp|  91  | avgt| 5  |13.412| ±   0.006|us/op
>> StringCompare.compareUUWithLdp|  101 | avgt| 5  |16.245| ±   0.434|us/op
>> StringCompare.compareUUWithLdp|  121 | avgt| 5  |16.597| ±   0.016|us/op
>> StringCompare.compareUUWithLdp|  181 | avgt| 5  |27.373| ±   0.017|us/op
>> StringCompare.compareUUWithLdp|  256 | avgt| 5  |41.74 | ±   3.5  |us/op
>> 
>> From this table, we can see that in most cases, our patch is better than old 
>> one.
>> 
>> Thank you for your review. Any suggestions are welcome.
>
> Wang Huang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   fix style and add unalign test case

I don't think we want to keep two copies of the compareTo intrinsic. If there 
are no cases where the LDP version is worse than the original version then we 
should just delete the old one and replace it with this.

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v4]

2021-07-14 Thread Wang Huang
> Dear all, 
> Can you do me a favor to review this patch. This patch use `ldp` to 
> implement String.compareTo.
>
> * We add a JMH test case 
> * Here is the result of this test case
>  
> Benchmark|(size)| Mode| Cnt|Score | Error  |Units 
> -|--|-||--||-
> StringCompare.compareLL   |  64  | avgt| 5  |7.992 | ±0.005|us/op
> StringCompare.compareLL   |  72  | avgt| 5  |15.029| ±0.006|us/op
> StringCompare.compareLL   |  80  | avgt| 5  |14.655| ±0.011|us/op
> StringCompare.compareLL   |  91  | avgt| 5  |16.363| ±0.12 |us/op
> StringCompare.compareLL   |  101 | avgt| 5  |16.966| ±0.007|us/op
> StringCompare.compareLL   |  121 | avgt| 5  |19.276| ±0.006|us/op
> StringCompare.compareLL   |  181 | avgt| 5  |19.002| ±0.417|us/op
> StringCompare.compareLL   |  256 | avgt| 5  |24.707| ±0.041|us/op
> StringCompare.compareLLWithLdp|  64  | avgt| 5  |8.001| ± 
> 0.121|us/op
> StringCompare.compareLLWithLdp|  72  | avgt| 5  |11.573| ±0.003|us/op
> StringCompare.compareLLWithLdp|  80  | avgt| 5  |6.861 | ±0.004|us/op
> StringCompare.compareLLWithLdp|  91  | avgt| 5  |12.774| ±0.201|us/op
> StringCompare.compareLLWithLdp|  101 | avgt| 5  |8.691 | ±0.004|us/op
> StringCompare.compareLLWithLdp|  121 | avgt| 5  |11.091| ±1.342|us/op
> StringCompare.compareLLWithLdp|  181 | avgt| 5  |14.64 | ±0.581|us/op
> StringCompare.compareLLWithLdp|  256 | avgt| 5  |25.879| ±1.775|us/op
> StringCompare.compareUU   |  64  | avgt| 5  |13.476| ±0.01 |us/op
> StringCompare.compareUU   |  72  | avgt| 5  |15.078| ±0.006|us/op
> StringCompare.compareUU   |  80  | avgt| 5  |23.512| ±0.011|us/op
> StringCompare.compareUU   |  91  | avgt| 5  |24.284| ±0.008|us/op
> StringCompare.compareUU   |  101 | avgt| 5  |20.707| ±0.017|us/op
> StringCompare.compareUU   |  121 | avgt| 5  |29.302| ±0.011|us/op
> StringCompare.compareUU   |  181 | avgt| 5  |39.31| ± 
> 0.016|us/op
> StringCompare.compareUU   |  256 | avgt| 5  |54.592| ±0.392|us/op
> StringCompare.compareUUWithLdp|  64  | avgt| 5  |16.389| ±0.008|us/op
> StringCompare.compareUUWithLdp|  72  | avgt| 5  |10.71 | ±0.158|us/op
> StringCompare.compareUUWithLdp|  80  | avgt| 5  |11.488| ±0.024|us/op
> StringCompare.compareUUWithLdp|  91  | avgt| 5  |13.412| ±0.006|us/op
> StringCompare.compareUUWithLdp|  101 | avgt| 5  |16.245| ±0.434|us/op
> StringCompare.compareUUWithLdp|  121 | avgt| 5  |16.597| ±0.016|us/op
> StringCompare.compareUUWithLdp|  181 | avgt| 5  |27.373| ±0.017|us/op
> StringCompare.compareUUWithLdp|  256 | avgt| 5  |41.74 | ±3.5  |us/op
> 
> From this table, we can see that in most cases, our patch is better than old 
> one.
> 
> Thank you for your review. Any suggestions are welcome.

Wang Huang has updated the pull request incrementally with one additional 
commit since the last revision:

  fix style and add unalign test case

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/4722/files
  - new: https://git.openjdk.java.net/jdk/pull/4722/files/3fa9afcb..c85cd126

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=4722=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=4722=02-03

  Stats: 32 lines in 2 files changed: 22 ins; 1 del; 9 mod
  Patch: https://git.openjdk.java.net/jdk/pull/4722.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/4722/head:pull/4722

PR: https://git.openjdk.java.net/jdk/pull/4722