Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v6]

2020-10-14 Thread Hohensee, Paul
My apologies. I relied on the other reviewers. I'll do an independent review in 
the future.

Thanks,
Paul

On 10/14/20, 11:02 AM, "core-libs-dev on behalf of Roger Riggs" 
 
wrote:

On Wed, 14 Oct 2020 15:01:42 GMT, Roger Riggs  wrote:

>> Due to the requirement for multiple reviewers, I had been waiting to add 
my review of the Core-Libs files until the
>> HotSpot reviewers had approved!  I see only one reviewer credited in the 
commit.
>
> This integration without testing with a current merge from the master and 
has caused two build failures.
>
> JDK-8254761: Wrong intrinsic annotation used for StringLatin1.indexOfChar
>
> JDK-8254775: Microbenchmark StringIndexOfChar doesn't compile
>
> There is a raw unicode character in the JMH test that causes a 
compilation error.
> == Output from failing command(s) repeated here ===
> [2020-10-14T14:39:09,608Z] * For target 
support_test_micro_classes__the.BUILD_JDK_MICROBENCHMARK_batch:
> [2020-10-14T14:39:09,611Z]
> 
/opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71:
> error: unmappable character (0xE2) for encoding ascii 
[2020-10-14T14:39:09,611Z]
> sb.append(isUtf16?'???':'b'); [2020-10-14T14:39:09,611Z]  
  ^ [2020-10-14T14:39:09,611Z]
> 
/opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71:
> error: unmappable character (0x98) for encoding ascii 
[2020-10-14T14:39:09,611Z]
> sb.append(isUtf16?'???':'b'); [2020-10-14T14:39:09,611Z]  
   ^ [2020-10-14T14:39:09,611Z]
> 
/opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71:
> error: unmappable character (0xBA) for encoding ascii 
[2020-10-14T14:39:09,611Z]
> sb.append(isUtf16?'???':'b'); [2020-10-14T14:39:09,611Z]  
^ [2020-10-14T14:39:09,611Z]
> 
/opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71:
> error: unclosed character literal [2020-10-14T14:39:09,611Z] 
sb.append(isUtf16?'???':'b');
> [2020-10-14T14:39:09,611Z]   ^ 
[2020-10-14T14:39:09,611Z]
> 
/opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71:
> error: unclosed character literal [2020-10-14T14:39:09,611Z] 
sb.append(isUtf16?'???':'b');```

And also a failed Graal test because of the new intrinsic.

And JDK-8254785: compiler/graalunit/HotspotTest.java failed with "missing 
Graal intrinsics for:
java/lang/StringLatin1.indexOfChar([BIII)I"

@phohensee don't be so quick to type `/sponsor`; there are three separate 
build and test failures.

-

PR: https://git.openjdk.java.net/jdk/pull/71



Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v6]

2020-10-14 Thread Daniel D . Daugherty
On Wed, 14 Oct 2020 17:59:53 GMT, Roger Riggs  wrote:

>> This integration without testing with a current merge from the master and 
>> has caused two build failures.
>> 
>> JDK-8254761: Wrong intrinsic annotation used for StringLatin1.indexOfChar
>> 
>> JDK-8254775: Microbenchmark StringIndexOfChar doesn't compile
>> 
>> There is a raw unicode character in the JMH test that causes a compilation 
>> error.
>> == Output from failing command(s) repeated here ===
>> [2020-10-14T14:39:09,608Z] * For target 
>> support_test_micro_classes__the.BUILD_JDK_MICROBENCHMARK_batch:
>> [2020-10-14T14:39:09,611Z]
>> /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71:
>> error: unmappable character (0xE2) for encoding ascii 
>> [2020-10-14T14:39:09,611Z]
>> sb.append(isUtf16?'???':'b'); [2020-10-14T14:39:09,611Z] 
>>^ [2020-10-14T14:39:09,611Z]
>> /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71:
>> error: unmappable character (0x98) for encoding ascii 
>> [2020-10-14T14:39:09,611Z]
>> sb.append(isUtf16?'???':'b'); [2020-10-14T14:39:09,611Z] 
>> ^ [2020-10-14T14:39:09,611Z]
>> /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71:
>> error: unmappable character (0xBA) for encoding ascii 
>> [2020-10-14T14:39:09,611Z]
>> sb.append(isUtf16?'???':'b'); [2020-10-14T14:39:09,611Z] 
>>  ^ [2020-10-14T14:39:09,611Z]
>> /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71:
>> error: unclosed character literal [2020-10-14T14:39:09,611Z] 
>> sb.append(isUtf16?'???':'b');
>> [2020-10-14T14:39:09,611Z]   ^ 
>> [2020-10-14T14:39:09,611Z]
>> /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71:
>> error: unclosed character literal [2020-10-14T14:39:09,611Z] 
>> sb.append(isUtf16?'???':'b');```
>
> And also a failed Graal test because of the new intrinsic.
> 
> And JDK-8254785: compiler/graalunit/HotspotTest.java failed with "missing 
> Graal intrinsics for:
> java/lang/StringLatin1.indexOfChar([BIII)I"
> @phohensee don't be so quick to type `/sponsor`; there are three separate 
> build and test failures.

@phohensee - @vnkozlov has determined that a new Tier2 test failure is
also caused by this fix. See https://bugs.openjdk.java.net/browse/JDK-8254790.

-

PR: https://git.openjdk.java.net/jdk/pull/71


Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v6]

2020-10-14 Thread Roger Riggs
On Wed, 14 Oct 2020 15:01:42 GMT, Roger Riggs  wrote:

>> Due to the requirement for multiple reviewers, I had been waiting to add my 
>> review of the Core-Libs files until the
>> HotSpot reviewers had approved!  I see only one reviewer credited in the 
>> commit.
>
> This integration without testing with a current merge from the master and has 
> caused two build failures.
> 
> JDK-8254761: Wrong intrinsic annotation used for StringLatin1.indexOfChar
> 
> JDK-8254775: Microbenchmark StringIndexOfChar doesn't compile
> 
> There is a raw unicode character in the JMH test that causes a compilation 
> error.
> == Output from failing command(s) repeated here ===
> [2020-10-14T14:39:09,608Z] * For target 
> support_test_micro_classes__the.BUILD_JDK_MICROBENCHMARK_batch:
> [2020-10-14T14:39:09,611Z]
> /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71:
> error: unmappable character (0xE2) for encoding ascii 
> [2020-10-14T14:39:09,611Z]
> sb.append(isUtf16?'???':'b'); [2020-10-14T14:39:09,611Z]  
>   ^ [2020-10-14T14:39:09,611Z]
> /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71:
> error: unmappable character (0x98) for encoding ascii 
> [2020-10-14T14:39:09,611Z]
> sb.append(isUtf16?'???':'b'); [2020-10-14T14:39:09,611Z]  
>^ [2020-10-14T14:39:09,611Z]
> /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71:
> error: unmappable character (0xBA) for encoding ascii 
> [2020-10-14T14:39:09,611Z]
> sb.append(isUtf16?'???':'b'); [2020-10-14T14:39:09,611Z]  
> ^ [2020-10-14T14:39:09,611Z]
> /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71:
> error: unclosed character literal [2020-10-14T14:39:09,611Z] 
> sb.append(isUtf16?'???':'b');
> [2020-10-14T14:39:09,611Z]   ^ 
> [2020-10-14T14:39:09,611Z]
> /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71:
> error: unclosed character literal [2020-10-14T14:39:09,611Z] 
> sb.append(isUtf16?'???':'b');```

And also a failed Graal test because of the new intrinsic.

And JDK-8254785: compiler/graalunit/HotspotTest.java failed with "missing Graal 
intrinsics for:
java/lang/StringLatin1.indexOfChar([BIII)I"

@phohensee don't be so quick to type `/sponsor`; there are three separate build 
and test failures.

-

PR: https://git.openjdk.java.net/jdk/pull/71


Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v6]

2020-10-14 Thread Roger Riggs
On Wed, 14 Oct 2020 13:18:19 GMT, Roger Riggs  wrote:

>> Marked as reviewed by neliasso (Reviewer).
>
> Due to the requirement for multiple reviewers, I had been waiting to add my 
> review of the Core-Libs files until the
> HotSpot reviewers had approved!  I see only one reviewer credited in the 
> commit.

This integration without testing with a current merge from the master and has 
caused two build failures.
JDK-8254761: Wrong intrinsic annotation used for StringLatin1.indexOfChar

Also, there is a raw unicode character in the JMH test that causes a 
compilation error.
== Output from failing command(s) repeated here ===
[2020-10-14T14:39:09,608Z] * For target 
support_test_micro_classes__the.BUILD_JDK_MICROBENCHMARK_batch:
[2020-10-14T14:39:09,611Z]
/opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71:
error: unmappable character (0xE2) for encoding ascii [2020-10-14T14:39:09,611Z]
sb.append(isUtf16?'???':'b'); [2020-10-14T14:39:09,611Z]
^ [2020-10-14T14:39:09,611Z]
/opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71:
error: unmappable character (0x98) for encoding ascii [2020-10-14T14:39:09,611Z]
sb.append(isUtf16?'???':'b'); [2020-10-14T14:39:09,611Z]
 ^ [2020-10-14T14:39:09,611Z]
/opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71:
error: unmappable character (0xBA) for encoding ascii [2020-10-14T14:39:09,611Z]
sb.append(isUtf16?'???':'b'); [2020-10-14T14:39:09,611Z]
  ^ [2020-10-14T14:39:09,611Z]
/opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71:
error: unclosed character literal [2020-10-14T14:39:09,611Z] 
sb.append(isUtf16?'???':'b');
[2020-10-14T14:39:09,611Z]   ^ 
[2020-10-14T14:39:09,611Z]
/opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71:
error: unclosed character literal [2020-10-14T14:39:09,611Z] 
sb.append(isUtf16?'???':'b');```

-

PR: https://git.openjdk.java.net/jdk/pull/71


Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v6]

2020-10-14 Thread Roger Riggs
On Wed, 14 Oct 2020 12:03:42 GMT, Nils Eliasson  wrote:

>> Jason Tatton has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Added missing copyright notices
>
> Marked as reviewed by neliasso (Reviewer).

Due to the requirement for multiple reviewers, I had been waiting to add my 
review of the Core-Libs files until the
HotSpot reviewers had approved!  I see only one reviewer credited in the commit.

-

PR: https://git.openjdk.java.net/jdk/pull/71


Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v6]

2020-10-14 Thread Nils Eliasson
On Mon, 12 Oct 2020 11:17:25 GMT, Jason Tatton 
 wrote:

>> This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 
>> byte encoded Strings). It is provided for
>> x86 and ARM64. The implementation is greatly inspired by the indexOf(char) 
>> intrinsic for StringUTF16. To incorporate it
>> I had to make a small change to StringLatin1.java (refactor of functionality 
>> to intrisified private method) as well as
>> code for C2.  Submitted to: hotspot-compiler-dev and core-libs-dev as this 
>> patch contains a change to hotspot and
>> java/lang/StringLatin1.java  https://bugs.openjdk.java.net/browse/JDK-8173585
>> 
>> Details of testing:
>> 
>> I have created a jtreg test 
>> “compiler/intrinsics/string/TestStringLatin1IndexOfChar” to cover this new 
>> intrinsic. Note
>> that, particularly for the x86 implementation of the intrinsic, the code 
>> path taken is dependent upon the length of the
>> input String. Hence the test has been designed to cover all these cases. In 
>> summary they are:
>> - A “short” string of < 16 characters.
>> - A SIMD String of 16 – 31 characters.
>> - A AVX2 SIMD String of 32 characters+.
>> 
>> Hardware used for testing:
>> -
>> 
>> - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) 
>> • Intel i7 processor (with AVX2 support).
>> - AWS Graviton 2 (ARM 64 processor).
>> 
>> I also ran; ‘run-test-tier1’ and ‘run-test-tier2’ for: x86_64 and aarch64.
>> 
>> Possible future enhancements:
>> 
>> For the x86 implementation there may be two further improvements we can make 
>> in order to improve performance of both
>> the StringUTF16 and StringLatin1 indexOf(char) intrinsics:
>> 1. Make use of AVX-512 instructions.
>> 2. For “short” Strings (see below), I think it may be possible to modify the 
>> existing algorithm to still use SSE SIMD
>> instructions instead of a loop.
>> Benchmark results:
>> 
>> **Without** the new StringLatin1 indexOf(char) intrinsic:
>> 
>> | Benchmark  | Mode | Cnt | Score | Error | Units |
>> | - | - |- |- |- 
>> |- |
>> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ± 182.581 
>> | ns/op |
>> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ± 435.933  | 
>> ns/op |
>> 
>> 
>> **With** the new StringLatin1 indexOf(char) intrinsic:
>> 
>> | Benchmark  | Mode | Cnt | Score | Error | Units |
>> | - | - |- |- |- 
>> |- |
>> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ± 407.716 
>>  | ns/op |
>> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ± 167.306  | 
>> ns/op |
>> 
>> 
>> The objective of the patch is to bring the performance of StringLatin1 
>> indexOf(char) in line with StringUTF16
>> indexOf(char) for x86 and ARM64. We can see above that this has been 
>> achieved. Similar results were obtained when
>> running on ARM.
>
> Jason Tatton has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Added missing copyright notices

Marked as reviewed by neliasso (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/71


Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v3]

2020-09-22 Thread Volker Simonis
On Mon, 21 Sep 2020 12:45:55 GMT, Jason Tatton 
 wrote:

>> This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 
>> byte encoded Strings). It is provided for
>> x86 and ARM64. The implementation is greatly inspired by the indexOf(char) 
>> intrinsic for StringUTF16. To incorporate it
>> I had to make a small change to StringLatin1.java (refactor of functionality 
>> to intrisified private method) as well as
>> code for C2.  Submitted to: hotspot-compiler-dev and core-libs-dev as this 
>> patch contains a change to hotspot and
>> java/lang/StringLatin1.java  https://bugs.openjdk.java.net/browse/JDK-8173585
>> 
>> Details of testing:
>> 
>> I have created a jtreg test 
>> “compiler/intrinsics/string/TestStringLatin1IndexOfChar” to cover this new 
>> intrinsic. Note
>> that, particularly for the x86 implementation of the intrinsic, the code 
>> path taken is dependent upon the length of the
>> input String. Hence the test has been designed to cover all these cases. In 
>> summary they are:
>> - A “short” string of < 16 characters.
>> - A SIMD String of 16 – 31 characters.
>> - A AVX2 SIMD String of 32 characters+.
>> 
>> Hardware used for testing:
>> -
>> 
>> - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) 
>> • Intel i7 processor (with AVX2 support).
>> - AWS Graviton 2 (ARM 64 processor).
>> 
>> I also ran; ‘run-test-tier1’ and ‘run-test-tier2’ for: x86_64 and aarch64.
>> 
>> Possible future enhancements:
>> 
>> For the x86 implementation there may be two further improvements we can make 
>> in order to improve performance of both
>> the StringUTF16 and StringLatin1 indexOf(char) intrinsics:
>> 1. Make use of AVX-512 instructions.
>> 2. For “short” Strings (see below), I think it may be possible to modify the 
>> existing algorithm to still use SSE SIMD
>> instructions instead of a loop.
>> Benchmark results:
>> 
>> **Without** the new StringLatin1 indexOf(char) intrinsic:
>> 
>> | Benchmark  | Mode | Cnt | Score | Error | Units |
>> | - | - |- |- |- 
>> |- |
>> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ± 182.581 
>> | ns/op |
>> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ± 435.933  | 
>> ns/op |
>> 
>> 
>> **With** the new StringLatin1 indexOf(char) intrinsic:
>> 
>> | Benchmark  | Mode | Cnt | Score | Error | Units |
>> | - | - |- |- |- 
>> |- |
>> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ± 407.716 
>>  | ns/op |
>> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ± 167.306  | 
>> ns/op |
>> 
>> 
>> The objective of the patch is to bring the performance of StringLatin1 
>> indexOf(char) in line with StringUTF16
>> indexOf(char) for x86 and ARM64. We can see above that this has been 
>> achieved. Similar results were obtained when
>> running on ARM.
>
> Jason Tatton has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Add missing newline to end of vmSymbols.cpp

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1889:

> 1887: bind(SCAN_TO_32_CHAR_LOOP);
> 1888: vmovdqu(vec3, Address(result, 0));
> 1889: vpcmpeqb(vec3, vec3, vec1, 1);

Using `Assembler::AVX_256bit` instead of `1` will make this easier to 
understand.

-

PR: https://git.openjdk.java.net/jdk/pull/71


Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v2]

2020-09-22 Thread Volker Simonis
On Fri, 18 Sep 2020 11:56:09 GMT, Jason Tatton 
 wrote:

>> This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 
>> byte encoded Strings). It is provided for
>> x86 and ARM64. The implementation is greatly inspired by the indexOf(char) 
>> intrinsic for StringUTF16. To incorporate it
>> I had to make a small change to StringLatin1.java (refactor of functionality 
>> to intrisified private method) as well as
>> code for C2.  Submitted to: hotspot-compiler-dev and core-libs-dev as this 
>> patch contains a change to hotspot and
>> java/lang/StringLatin1.java  https://bugs.openjdk.java.net/browse/JDK-8173585
>> 
>> Details of testing:
>> 
>> I have created a jtreg test 
>> “compiler/intrinsics/string/TestStringLatin1IndexOfChar” to cover this new 
>> intrinsic. Note
>> that, particularly for the x86 implementation of the intrinsic, the code 
>> path taken is dependent upon the length of the
>> input String. Hence the test has been designed to cover all these cases. In 
>> summary they are:
>> - A “short” string of < 16 characters.
>> - A SIMD String of 16 – 31 characters.
>> - A AVX2 SIMD String of 32 characters+.
>> 
>> Hardware used for testing:
>> -
>> 
>> - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) 
>> • Intel i7 processor (with AVX2 support).
>> - AWS Graviton 2 (ARM 64 processor).
>> 
>> I also ran; ‘run-test-tier1’ and ‘run-test-tier2’ for: x86_64 and aarch64.
>> 
>> Possible future enhancements:
>> 
>> For the x86 implementation there may be two further improvements we can make 
>> in order to improve performance of both
>> the StringUTF16 and StringLatin1 indexOf(char) intrinsics:
>> 1. Make use of AVX-512 instructions.
>> 2. For “short” Strings (see below), I think it may be possible to modify the 
>> existing algorithm to still use SSE SIMD
>> instructions instead of a loop.
>> Benchmark results:
>> 
>> **Without** the new StringLatin1 indexOf(char) intrinsic:
>> 
>> | Benchmark  | Mode | Cnt | Score | Error | Units |
>> | - | - |- |- |- 
>> |- |
>> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ± 182.581 
>> | ns/op |
>> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ± 435.933  | 
>> ns/op |
>> 
>> 
>> **With** the new StringLatin1 indexOf(char) intrinsic:
>> 
>> | Benchmark  | Mode | Cnt | Score | Error | Units |
>> | - | - |- |- |- 
>> |- |
>> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ± 407.716 
>>  | ns/op |
>> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ± 167.306  | 
>> ns/op |
>> 
>> 
>> The objective of the patch is to bring the performance of StringLatin1 
>> indexOf(char) in line with StringUTF16
>> indexOf(char) for x86 and ARM64. We can see above that this has been 
>> achieved. Similar results were obtained when
>> running on ARM.
>
> Jason Tatton has updated the pull request with a new target base due to a 
> merge or a rebase. The pull request now
> contains four commits:
>  - Merge master
>  - 8173585: further whitespace changes required by jcheck
>  - JDK-8173585 - whitespace changes required by jcheck
>  - JDK-8173585

Hi Jason,

thanks for bringing String.indexOf() for latin strings up to date with the 
Unicode version.

Your changes look good except a few minor issues I've commented on right in the 
code.

I'd only like to ask you if you could possibly improve your test a little bit. 
As far as I understand, your search text
is a consecutive sequence of "abc" characters, so you'll always find the 
character your searching for within the next
three characters of the source text. This won't exercise the loops of your 
intrinsic. Maybe you can also add some test
versions where the search character will be found beyond the first 32/64 
characters after "fromIndex"?

test/hotspot/jtreg/compiler/intrinsics/string/TestStringLatin1IndexOfChar.java 
line 24:

> 22:
> 23: public static void main(String[] args) throws Exception {
> 24: for (int i = 0; i < 100_0; ++i) {//repeat such that we enter into 
> C2 code...

The placement of the underscore looks strange to me. I'd expect it to separate 
thousands (like 1_000) if at all but not
sure if id use it for one thousand at all as that's really not such a big 
number that it is hard to read..

Also, the Tier4InvocationThreshold is 5000 so I'm not sure youre reaching C2?

src/hotspot/share/classfile/vmSymbols.cpp line 295:

> 293:   if (symbol == NULL)  return NO_SID;
> 294:   return find_sid(symbol);
> 295: }

I think it is common sense to have a newline at the end of a file.

test/hotspot/jtreg/compiler/intrinsics/string/TestStringLatin1IndexOfChar.java 
line 84:

> 82: }
> 83:
> 84:  }

Please put an EOL at the end of the file.


Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char)

2020-09-22 Thread Andrew Dinn
On Tue, 22 Sep 2020 02:31:14 GMT, Vladimir Kozlov  wrote:

>>>  >Can you explain where this restricted effect is documented?
>> 
>>> Certainly! I’ve found that determining the capability of the CPU and 
>>> whether to enable AVX2 support if the chip
>>> supports it is mostly controlled in: vm_version_x86.cpp specifically: 
>>> get_processor_features and in
>>> generate_get_cpu_info.
>> 
>> Yes, I can see what the code does. I was asking where the cpu behaviour is 
>> documented independent of the code.
>> 
>>> In order to test the patch comprehensively I had to track down an Intel 
>>> Core i7 (I7-9750H) processor which the
>>> aforementioned code permitted AVX2 instructions for (maybe this is an error 
>>> and it should not be enabled for this
>>> processor though) as most of the infrastructure I personally use here at 
>>> AWS runs on Intel Xeon processors - I also
>>> tested on a E5-2680 which the JVM does not enable AVX2 for.
>> 
>> 'maybe'? The documentation Andrew provided mentioned Xeon E5 v3 which I 
>> believe is a Skylake design. However, the code
>> I pointed you at in vm_version_x86 which claims to detect 'early Skylake' 
>> designs is only disabling AVX512 support. It
>> still enables AVX2. Similarly, the code that generates machine code to check 
>> the processor capabilities has a special
>> check if use_evex is set (i.e. AVX3 is requested)  which disables AVX512 for 
>> Skylake but does not disable AVX2 support.
>>> However, this is just the Intel side of things. When it comes to AMD I read 
>>> that the AMD Zen 2 architecture, of which
>>> the current flagship: Threadripper 3990X, is based, is able to support AVX2 
>>> without the frequency scaling which
>>> some/all(?) of the Intel chips incur. I personally don’t have access to one 
>>> of these chips so I cannot confirm how it
>>> is classified in the JVM.
>> 
>> Well, it would be good to know where you read that and to see if that 
>> confirms thar the code is avoiding the issue
>> Andrew raised.
>>> Also, I found when investigating this that there is actually a JVM flag 
>>> which can be used to control what level of AVX
>>> is enabled: -XX:UseAVX=version.
>> 
>> Yes, indeed. However, what I am trying to understand is whether the current 
>> code is bypassing the problem Andrew
>> brought up in the cases where that problem actually exists. It doesn't look 
>> like it so far given that the problem
>> applies to AVX2 and only AVX512 support is being disabled and, even then 
>> only for some (Skylake) processors. Without
>> some clear documentation of what processors suffer from this power surge 
>> problem it will not be possible to decide
>> whether this patch is doing the right thing or not.
>
> Based on comment by @jatin-bhateja (Intel) frequency level switchover pointed 
> by @theRealAph  is sensitive to vector
> size https://github.com/openjdk/jdk/pull/144#issuecomment-692044896
> 
> By keeping vector size less or equal to 32 bytes we should avoid it. And as I 
> can see this intrinsic code is using 32
> bytes (chars) and 16 bytes vectors: `pbroadcastb(vec1, vec1, 
> Assembler::AVX_256bit);`
> Also we never had issues with AVX2. only with AVX512 regarding performance 
> hit:
> https://bugs.openjdk.java.net/browse/JDK-8221092
> 
> I would like to see performance numbers for for all values of UseAVX flag : 
> 0, 1, 2, 3
> 
> The usage is guarded UseSSE42Intrinsics in UseSSE42Intrinsics predicate in 
> .ad file. Make sure to test with UseAVX=0 to
> make sure that some avx instructions are not mixed into non avx code. And 
> also with UseSSE=2 (for example) to make sure
> shared code correctly recognize that intrinsics is not supported.

@vnkozlov @mknjc  @jatin-bhateja Thanks for providing the relevant details. I'm 
now quite content that this patch
avoids any potential frequency scaling problem. I'm also glad that an 
explanation of why this is so is now
available --  although it's not perfect that we are relying on a stackoverflow 
post for the full details.

-

PR: https://git.openjdk.java.net/jdk/pull/71


Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char)

2020-09-21 Thread mknjc
On Fri, 18 Sep 2020 23:11:46 GMT, Jason Tatton 
 wrote:

>> "the JVM has knowledge of the AVX capability of the chip it’s running on and 
>> disables the AVX2 code path for chips
>> which suffer from the performance degradation which has been outlined in 
>> this discussion"
>> Does it? The white paper Andrew cited doesn't mention this as being specific 
>> to only some chips that implement AVX2.
>> Can you explain where this restricted effect is documented?
>> Also, I assume you are referring to the code in vm_version_x86.cpp with this 
>> comment
>> 
>> // Don't use AVX-512 on older Skylakes unless explicitly requested
>> 
>> is that correct?
>
>> Can you explain where this restricted effect is documented?
> 
> Certainly! I’ve found that determining the capability of the CPU and whether 
> to enable AVX2 support if the chip
> supports it is mostly controlled in: [vm_version_x86.cpp](
> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp)
>  specifically:
> [get_processor_features](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L684-L755)
> and in [generate_get_cpu_info](
> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L69-L611).
>   In order to test the
> patch comprehensively I had to track down an Intel Core i7 (I7-9750H) 
> processor which the aforementioned code permitted
> AVX2 instructions for (maybe this is an error and it should not be enabled 
> for this processor though) as most of the
> infrastructure I personally use here at AWS runs on Intel Xeon processors - I 
> also tested on a E5-2680 which the JVM
> does not enable AVX2 for.  However, this is just the Intel side of things. 
> When it comes to AMD I read that the AMD Zen
> 2 architecture, of which the current flagship: Threadripper 3990X, is based, 
> is able to support AVX2 [without the
> frequency scaling](
> https://www.anandtech.com/Show/Index/14525?cPage=7=False=0=9=amd-zen-2-microarchitecture-analysis-ryzen-3000-and-epyc-rome)
> which some/all(?) of the Intel chips incur. I personally don’t have access to 
> one of these chips so I cannot confirm
> how it is classified in the JVM.  Also, I found when investigating this that 
> there is actually a JVM flag which can be
> used to control what level of AVX is enabled: `-XX:UseAVX=version.`

I really don't see the problem with using AVX for this?
As long as the used instructions all only activate license l0 the cpu don't 
scale the frequency at all. For AVX2 these
are most of all instructions which don't use the FMA or floating point ports. 
Additionally the cpu doesn't instant
scale down the frequency but runs the 256 bit instructions with reduced 
throughput but full cpu clock until enough
instructions use the avx command set. For more information see 
https://stackoverflow.com/a/56861355/130541

So as long the 512bit width instructions aren't used there should no frequency 
scaling happening.

-

PR: https://git.openjdk.java.net/jdk/pull/71


Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char)

2020-09-21 Thread Vladimir Kozlov
On Mon, 21 Sep 2020 09:20:56 GMT, Andrew Dinn  wrote:

>>> Can you explain where this restricted effect is documented?
>> 
>> Certainly! I’ve found that determining the capability of the CPU and whether 
>> to enable AVX2 support if the chip
>> supports it is mostly controlled in: [vm_version_x86.cpp](
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp)
>>  specifically:
>> [get_processor_features](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L684-L755)
>> and in [generate_get_cpu_info](
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L69-L611).
>>   In order to test the
>> patch comprehensively I had to track down an Intel Core i7 (I7-9750H) 
>> processor which the aforementioned code permitted
>> AVX2 instructions for (maybe this is an error and it should not be enabled 
>> for this processor though) as most of the
>> infrastructure I personally use here at AWS runs on Intel Xeon processors - 
>> I also tested on a E5-2680 which the JVM
>> does not enable AVX2 for.  However, this is just the Intel side of things. 
>> When it comes to AMD I read that the AMD Zen
>> 2 architecture, of which the current flagship: Threadripper 3990X, is based, 
>> is able to support AVX2 [without the
>> frequency scaling](
>> https://www.anandtech.com/Show/Index/14525?cPage=7=False=0=9=amd-zen-2-microarchitecture-analysis-ryzen-3000-and-epyc-rome)
>> which some/all(?) of the Intel chips incur. I personally don’t have access 
>> to one of these chips so I cannot confirm
>> how it is classified in the JVM.  Also, I found when investigating this that 
>> there is actually a JVM flag which can be
>> used to control what level of AVX is enabled: `-XX:UseAVX=version.`
>
>>  >Can you explain where this restricted effect is documented?
> 
>> Certainly! I’ve found that determining the capability of the CPU and whether 
>> to enable AVX2 support if the chip
>> supports it is mostly controlled in: vm_version_x86.cpp specifically: 
>> get_processor_features and in
>> generate_get_cpu_info.
> 
> Yes, I can see what the code does. I was asking where the cpu behaviour is 
> documented independent of the code.
> 
>> In order to test the patch comprehensively I had to track down an Intel Core 
>> i7 (I7-9750H) processor which the
>> aforementioned code permitted AVX2 instructions for (maybe this is an error 
>> and it should not be enabled for this
>> processor though) as most of the infrastructure I personally use here at AWS 
>> runs on Intel Xeon processors - I also
>> tested on a E5-2680 which the JVM does not enable AVX2 for.
> 
> 'maybe'? The documentation Andrew provided mentioned Xeon E5 v3 which I 
> believe is a Skylake design. However, the code
> I pointed you at in vm_version_x86 which claims to detect 'early Skylake' 
> designs is only disabling AVX512 support. It
> still enables AVX2. Similarly, the code that generates machine code to check 
> the processor capabilities has a special
> check if use_evex is set (i.e. AVX3 is requested)  which disables AVX512 for 
> Skylake but does not disable AVX2 support.
>> However, this is just the Intel side of things. When it comes to AMD I read 
>> that the AMD Zen 2 architecture, of which
>> the current flagship: Threadripper 3990X, is based, is able to support AVX2 
>> without the frequency scaling which
>> some/all(?) of the Intel chips incur. I personally don’t have access to one 
>> of these chips so I cannot confirm how it
>> is classified in the JVM.
> 
> Well, it would be good to know where you read that and to see if that 
> confirms thar the code is avoiding the issue
> Andrew raised.
>> Also, I found when investigating this that there is actually a JVM flag 
>> which can be used to control what level of AVX
>> is enabled: -XX:UseAVX=version.
> 
> Yes, indeed. However, what I am trying to understand is whether the current 
> code is bypassing the problem Andrew
> brought up in the cases where that problem actually exists. It doesn't look 
> like it so far given that the problem
> applies to AVX2 and only AVX512 support is being disabled and, even then only 
> for some (Skylake) processors. Without
> some clear documentation of what processors suffer from this power surge 
> problem it will not be possible to decide
> whether this patch is doing the right thing or not.

Based on comment by @jatin-bhateja (Intel) frequency level switchover pointed 
by @theRealAph  is sensitive to vector
size https://github.com/openjdk/jdk/pull/144#issuecomment-692044896

By keeping vector size less or equal to 32 bytes we should avoid it. And as I 
can see this intrinsic code is using 32
bytes (chars) and 16 bytes vectors: `pbroadcastb(vec1, vec1, 
Assembler::AVX_256bit);`

Also we never had issues with AVX2. only with AVX512 regarding performance hit:
https://bugs.openjdk.java.net/browse/JDK-8221092

I would like to see performance numbers for for all values of UseAVX flag : 0, 
1, 2, 

Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v3]

2020-09-21 Thread Jason Tatton
> This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 
> byte encoded Strings). It is provided for
> x86 and ARM64. The implementation is greatly inspired by the indexOf(char) 
> intrinsic for StringUTF16. To incorporate it
> I had to make a small change to StringLatin1.java (refactor of functionality 
> to intrisified private method) as well as
> code for C2.  Submitted to: hotspot-compiler-dev and core-libs-dev as this 
> patch contains a change to hotspot and
> java/lang/StringLatin1.java  https://bugs.openjdk.java.net/browse/JDK-8173585
> 
> Details of testing:
> 
> I have created a jtreg test 
> “compiler/intrinsics/string/TestStringLatin1IndexOfChar” to cover this new 
> intrinsic. Note
> that, particularly for the x86 implementation of the intrinsic, the code path 
> taken is dependent upon the length of the
> input String. Hence the test has been designed to cover all these cases. In 
> summary they are:
> - A “short” string of < 16 characters.
> - A SIMD String of 16 – 31 characters.
> - A AVX2 SIMD String of 32 characters+.
> 
> Hardware used for testing:
> -
> 
> - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) 
> • Intel i7 processor (with AVX2 support).
> - AWS Graviton 2 (ARM 64 processor).
> 
> I also ran; ‘run-test-tier1’ and ‘run-test-tier2’ for: x86_64 and aarch64.
> 
> Possible future enhancements:
> 
> For the x86 implementation there may be two further improvements we can make 
> in order to improve performance of both
> the StringUTF16 and StringLatin1 indexOf(char) intrinsics:
> 1. Make use of AVX-512 instructions.
> 2. For “short” Strings (see below), I think it may be possible to modify the 
> existing algorithm to still use SSE SIMD
> instructions instead of a loop.
> Benchmark results:
> 
> **Without** the new StringLatin1 indexOf(char) intrinsic:
> 
> | Benchmark  | Mode | Cnt | Score | Error | Units |
> | - | - |- |- |- 
> |- |
> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ± 182.581 
> | ns/op |
> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ± 435.933  | 
> ns/op |
> 
> 
> **With** the new StringLatin1 indexOf(char) intrinsic:
> 
> | Benchmark  | Mode | Cnt | Score | Error | Units |
> | - | - |- |- |- 
> |- |
> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ± 407.716  
> | ns/op |
> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ± 167.306  | 
> ns/op |
> 
> 
> The objective of the patch is to bring the performance of StringLatin1 
> indexOf(char) in line with StringUTF16
> indexOf(char) for x86 and ARM64. We can see above that this has been 
> achieved. Similar results were obtained when
> running on ARM.

Jason Tatton has updated the pull request incrementally with one additional 
commit since the last revision:

  Add missing newline to end of vmSymbols.cpp

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/71/files
  - new: https://git.openjdk.java.net/jdk/pull/71/files/b85a7fb4..c8cc441e

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=71=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=71=01-02

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/71.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/71/head:pull/71

PR: https://git.openjdk.java.net/jdk/pull/71


Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v2]

2020-09-21 Thread Volker Simonis
On Fri, 18 Sep 2020 11:56:09 GMT, Jason Tatton 
 wrote:

>> This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 
>> byte encoded Strings). It is provided for
>> x86 and ARM64. The implementation is greatly inspired by the indexOf(char) 
>> intrinsic for StringUTF16. To incorporate it
>> I had to make a small change to StringLatin1.java (refactor of functionality 
>> to intrisified private method) as well as
>> code for C2.  Submitted to: hotspot-compiler-dev and core-libs-dev as this 
>> patch contains a change to hotspot and
>> java/lang/StringLatin1.java  https://bugs.openjdk.java.net/browse/JDK-8173585
>> 
>> Details of testing:
>> 
>> I have created a jtreg test 
>> “compiler/intrinsics/string/TestStringLatin1IndexOfChar” to cover this new 
>> intrinsic. Note
>> that, particularly for the x86 implementation of the intrinsic, the code 
>> path taken is dependent upon the length of the
>> input String. Hence the test has been designed to cover all these cases. In 
>> summary they are:
>> - A “short” string of < 16 characters.
>> - A SIMD String of 16 – 31 characters.
>> - A AVX2 SIMD String of 32 characters+.
>> 
>> Hardware used for testing:
>> -
>> 
>> - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) 
>> • Intel i7 processor (with AVX2 support).
>> - AWS Graviton 2 (ARM 64 processor).
>> 
>> I also ran; ‘run-test-tier1’ and ‘run-test-tier2’ for: x86_64 and aarch64.
>> 
>> Possible future enhancements:
>> 
>> For the x86 implementation there may be two further improvements we can make 
>> in order to improve performance of both
>> the StringUTF16 and StringLatin1 indexOf(char) intrinsics:
>> 1. Make use of AVX-512 instructions.
>> 2. For “short” Strings (see below), I think it may be possible to modify the 
>> existing algorithm to still use SSE SIMD
>> instructions instead of a loop.
>> Benchmark results:
>> 
>> **Without** the new StringLatin1 indexOf(char) intrinsic:
>> 
>> | Benchmark  | Mode | Cnt | Score | Error | Units |
>> | - | - |- |- |- 
>> |- |
>> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ± 182.581 
>> | ns/op |
>> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ± 435.933  | 
>> ns/op |
>> 
>> 
>> **With** the new StringLatin1 indexOf(char) intrinsic:
>> 
>> | Benchmark  | Mode | Cnt | Score | Error | Units |
>> | - | - |- |- |- 
>> |- |
>> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ± 407.716 
>>  | ns/op |
>> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ± 167.306  | 
>> ns/op |
>> 
>> 
>> The objective of the patch is to bring the performance of StringLatin1 
>> indexOf(char) in line with StringUTF16
>> indexOf(char) for x86 and ARM64. We can see above that this has been 
>> achieved. Similar results were obtained when
>> running on ARM.
>
> Jason Tatton has updated the pull request with a new target base due to a 
> merge or a rebase. The pull request now
> contains four commits:
>  - Merge master
>  - 8173585: further whitespace changes required by jcheck
>  - JDK-8173585 - whitespace changes required by jcheck
>  - JDK-8173585

src/hotspot/share/classfile/vmSymbols.cpp line 295:

> 293:   if (symbol == NULL)  return NO_SID;
> 294:   return find_sid(symbol);
> 295: }

I think it is common sense to have a newline at the end of a file.

-

PR: https://git.openjdk.java.net/jdk/pull/71


Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char)

2020-09-21 Thread Andrew Dinn
On Fri, 18 Sep 2020 23:11:46 GMT, Jason Tatton 
 wrote:

>  >Can you explain where this restricted effect is documented?

> Certainly! I’ve found that determining the capability of the CPU and whether 
> to enable AVX2 support if the chip
> supports it is mostly controlled in: vm_version_x86.cpp specifically: 
> get_processor_features and in
> generate_get_cpu_info.

Yes, I can see what the code does. I was asking where the cpu behaviour is 
documented independent of the code.

> In order to test the patch comprehensively I had to track down an Intel Core 
> i7 (I7-9750H) processor which the
> aforementioned code permitted AVX2 instructions for (maybe this is an error 
> and it should not be enabled for this
> processor though) as most of the infrastructure I personally use here at AWS 
> runs on Intel Xeon processors - I also
> tested on a E5-2680 which the JVM does not enable AVX2 for.

'maybe'? The documentation Andrew provided mentioned Xeon E5 v3 which I believe 
is a Skylake design. However, the code
I pointed you at in vm_version_x86 which claims to detect 'early Skylake' 
designs is only disabling AVX512 support. It
still enables AVX2. Similarly, the code generates machine code to check the 
processor capabilities has a special check
if use_evex is set (i.e. AVX3 is requested) for Skylake which disables AVX512 
but does nto disable AVX2 support.

However, this is just the Intel side of things. When it comes to AMD I read 
that the AMD Zen 2 architecture, of which
the current flagship: Threadripper 3990X, is based, is able to support AVX2 
without the frequency scaling which
some/all(?) of the Intel chips incur. I personally don’t have access to one of 
these chips so I cannot confirm how it
is classified in the JVM.

Also, I found when investigating this that there is actually a JVM flag which 
can be used to control what level of AVX
is enabled: -XX:UseAVX=version.

-

PR: https://git.openjdk.java.net/jdk/pull/71


Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char)

2020-09-18 Thread Jason Tatton
On Fri, 18 Sep 2020 16:31:23 GMT, Andrew Dinn  wrote:

> Can you explain where this restricted effect is documented?

Certainly! I’ve found that determining the capability of the CPU and whether to 
enable AVX2 support if the chip
supports it is mostly controlled in: [vm_version_x86.cpp](
https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp)
 specifically:
[get_processor_features](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L684-L755)
and in [generate_get_cpu_info](
https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L69-L611).

In order to test the patch comprehensively I had to track down an Intel Core i7 
(I7-9750H) processor which the
aforementioned code permitted AVX2 instructions for (maybe this is an error and 
it should not be enabled for this
processor though) as most of the infrastructure I personally use here at AWS 
runs on Intel Xeon processors - I also
tested on a E5-2680 which the JVM does not enable AVX2 for.

However, this is just the Intel side of things. When it comes to AMD I read 
that the AMD Zen 2 architecture, of which
the current flagship: Threadripper 3990X, is based, is able to support AVX2 
[without the frequency scaling](
https://www.anandtech.com/Show/Index/14525?cPage=7=False=0=9=amd-zen-2-microarchitecture-analysis-ryzen-3000-and-epyc-rome)
which some/all(?) of the Intel chips incur. I personally don’t have access to 
one of these chips so I cannot confirm
how it is classified in the JVM.

Also, I found when investigating this that there is actually a JVM flag which 
can be used to control what level of AVX
is enabled: `-XX:UseAVX=version.`

-

PR: https://git.openjdk.java.net/jdk/pull/71


Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char)

2020-09-18 Thread Andrew Dinn
On Fri, 18 Sep 2020 15:55:52 GMT, Jason Tatton 
 wrote:

>> Hi Andrew,
>> 
>> Thanks for coming back to me. Looking on the github 
>> [PR](https://github.com/openjdk/jdk/pull/71) nobody is tagged as a
>> reviewer for this (perhaps this is a feature which is not being used).
>>> That's
>>> because what is actually missing is the justification he asked for. As
>>> Andrew pointed out the change is simple but the reason for implementing
>>> it is not.
>> 
>> There are two separate things here:
>> 1). Justification for the change itself:
>> -The objective and justification for this patch is to bring the performance 
>> of StringLatin1 indexOf(char) in line with
>>  StringUTF16 indexOf(char) for x86 and ARM64. This solves the problem as 
>> raised in
>>  [JDK-8173585](https://bugs.openjdk.java.net/browse/JDK-8173585), and also 
>> on the [mailing
>>  
>> list](http://mail.openjdk.java.net/pipermail/jdk9-dev/2017-January/005539.html).
>> 
>> 2). Discussion around future enhancements - concerning potential use of 
>> AVX-512 instructions and a more optimal
>> implementation for short strings.
>> -These would be separate JBS's I'm not advocating for/against this, they are 
>> just ideas separate from this JBS.
>
> The AVX2 code path represents approximately 1/6th of the patch (1/7th 
> including the infrastructure  code around this).
> I don’t think we should discard the entire patch because 1/6th of the code 
> may have unintended consequences. This is
> especially the case when the rest of the code has been benchmarked, with 
> certainty, to show the desired performance
> improvement has been achieved.   Additionally, I do not see how those 
> unintended consequences will ever be realised
> because the JVM has knowledge of the AVX capability of the chip it’s running 
> on and disables the AVX2 code path for
> chips which suffer from the performance degradation which has been outlined 
> in this discussion. Thus protecting us from
> unintended consequences. Unless we are asserting that this mechanism to 
> globally control the use of AVX2 instructions
> is broken or otherwise non functional I see no reason to remove the AVX2 
> code. And to be consistent we would really
> need to look at removing all instances of AVX2 code in the JVM (of which 
> there is quite a lot).   As I see it there are
> three ways forward:  1.  Accept the patch as is.  2. Modify the patch to 
> remove the AVX code path for x86, and/or any
> other modifications needed. 3. Discard the patch entirely.  At this point I 
> am in favour of approach 1 but happy to
> accept 2 if advised that this is the right thing to do.

"the JVM has knowledge of the AVX capability of the chip it’s running on and 
disables the AVX2 code path for chips
which suffer from the performance degradation which has been outlined in this 
discussion"

Does it? The white paper Andrew cited doesn't mention this as being specific to 
only some chips that implement AVX2.
Can you explain where this restricted effect is documented?

Also, I assume you are referring to the code in vm_version_x86.cpp with this 
comment

// Don't use AVX-512 on older Skylakes unless explicitly requested

is that correct?

-

PR: https://git.openjdk.java.net/jdk/pull/71


Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char)

2020-09-18 Thread Jason Tatton
On Fri, 18 Sep 2020 11:04:34 GMT, Jason Tatton 
 wrote:

>> Hi everyone,
>> 
>> This patch is just missing a couple of reviewers... Please can someone step 
>> forward?
>> 
>> I think it's a fairly straightforward change.
>> 
>> -Jason
>
> Hi Andrew,
> 
> Thanks for coming back to me. Looking on the github 
> [PR](https://github.com/openjdk/jdk/pull/71) nobody is tagged as a
> reviewer for this (perhaps this is a feature which is not being used).
>> That's
>> because what is actually missing is the justification he asked for. As
>> Andrew pointed out the change is simple but the reason for implementing
>> it is not.
> 
> There are two separate things here:
> 1). Justification for the change itself:
> -The objective and justification for this patch is to bring the performance 
> of StringLatin1 indexOf(char) in line with
>  StringUTF16 indexOf(char) for x86 and ARM64. This solves the problem as 
> raised in
>  [JDK-8173585](https://bugs.openjdk.java.net/browse/JDK-8173585), and also on 
> the [mailing
>  
> list](http://mail.openjdk.java.net/pipermail/jdk9-dev/2017-January/005539.html).
> 
> 2). Discussion around future enhancements - concerning potential use of 
> AVX-512 instructions and a more optimal
> implementation for short strings.
> -These would be separate JBS's I'm not advocating for/against this, they are 
> just ideas separate from this JBS.

The AVX2 code path represents approximately 1/6th of the patch (1/7th including 
the infrastructure  code around this).
I don’t think we should discard the entire patch because 1/6th of the code may 
have unintended consequences. This is
especially the case when the rest of the code has been benchmarked, with 
certainty, to show the desired performance
improvement has been achieved.   Additionally, I do not see how those 
unintended consequences will ever be realised
because the JVM has knowledge of the AVX capability of the chip it’s running on 
and disables the AVX2 code path for
chips which suffer from the performance degradation which has been outlined in 
this discussion. Thus protecting us from
unintended consequences. Unless we are asserting that this mechanism to 
globally control the use of AVX2 instructions
is broken or otherwise non functional I see no reason to remove the AVX2 code. 
And to be consistent we would really
need to look at removing all instances of AVX2 code in the JVM (of which there 
is quite a lot).   As I see it there are
three ways forward:

1.  Accept the patch as is.
2. Modify the patch to remove the AVX code path for x86, and/or any other 
modifications needed.
3. Discard the patch entirely.

At this point I am in favour of approach 1 but happy to accept 2 if advised 
that this is the right thing to do.

-

PR: https://git.openjdk.java.net/jdk/pull/71


Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char)

2020-09-18 Thread Andrew Dinn
On 18/09/2020 12:06, Jason Tatton wrote:

> There are two separate things here:
> 1). Justification for the change itself:
> -The objective and justification for this patch is to bring the performance 
> of StringLatin1 indexOf(char) in
>  line with StringUTF16 indexOf(char) for x86 and ARM64. This solves the 
> problem as raised in
>  [JDK-8173585](https://bugs.openjdk.java.net/browse/JDK-8173585), and also on 
> the [mailing
>  
> list](http://mail.openjdk.java.net/pipermail/jdk9-dev/2017-January/005539.html).

> 2). Discussion around future enhancements - concerning potential use of 
> AVX-512 instructions and a more optimal
> implementation for short strings.
> -These would be separate JBS's I'm not advocating for/against this, they are 
> just ideas separate from this JBS.
I don't agree that these two things are separable. Andrew's point
applies to both.

In the first case the problem is that the 'evidence' we have does not
testify to the possibility Andrew outlines. Both code examples used to
justify the idea StringLatin1.indexOf(char) will perform 'better' with
an AVX-based intrinsic are micro-benchmarks that do a lot of intensive
String manipulation and nothing else i.e. they won't get hit by the
possible cost of ramping up power for AVX because they make extensive
use of AVX and then stop. That's very unlikely to happen in a real world
case. So, the fact that this change removes the disparity seen with
these benchmarks is still not evidence for a general improvement.

So, I don't (yet) see a reason to make this change and the possibility
still stands that adopting this change may end up making most code that
uses StringLatin1.indexOf(char) worse.

It might be a good idea to consider finding some way to test whether the
cost Andrew has highlighted makes a difference  before committing this
change. I know the same argument might might be raised aginst the
existing intrinsics but surely that's an a fortiori argument for not
proceeding.

regards,


Andrew Dinn
---
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill



Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v2]

2020-09-18 Thread Jason Tatton
> This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 
> byte encoded Strings). It is provided for
> x86 and ARM64. The implementation is greatly inspired by the indexOf(char) 
> intrinsic for StringUTF16. To incorporate it
> I had to make a small change to StringLatin1.java (refactor of functionality 
> to intrisified private method) as well as
> code for C2.  Submitted to: hotspot-compiler-dev and core-libs-dev as this 
> patch contains a change to hotspot and
> java/lang/StringLatin1.java  https://bugs.openjdk.java.net/browse/JDK-8173585
> 
> Details of testing:
> 
> I have created a jtreg test 
> “compiler/intrinsics/string/TestStringLatin1IndexOfChar” to cover this new 
> intrinsic. Note
> that, particularly for the x86 implementation of the intrinsic, the code path 
> taken is dependent upon the length of the
> input String. Hence the test has been designed to cover all these cases. In 
> summary they are:
> - A “short” string of < 16 characters.
> - A SIMD String of 16 – 31 characters.
> - A AVX2 SIMD String of 32 characters+.
> 
> Hardware used for testing:
> -
> 
> - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) 
> • Intel i7 processor (with AVX2 support).
> - AWS Graviton 2 (ARM 64 processor).
> 
> I also ran; ‘run-test-tier1’ and ‘run-test-tier2’ for: x86_64 and aarch64.
> 
> Possible future enhancements:
> 
> For the x86 implementation there may be two further improvements we can make 
> in order to improve performance of both
> the StringUTF16 and StringLatin1 indexOf(char) intrinsics:
> 1. Make use of AVX-512 instructions.
> 2. For “short” Strings (see below), I think it may be possible to modify the 
> existing algorithm to still use SSE SIMD
> instructions instead of a loop.
> Benchmark results:
> 
> **Without** the new StringLatin1 indexOf(char) intrinsic:
> 
> | Benchmark  | Mode | Cnt | Score | Error | Units |
> | - | - |- |- |- 
> |- |
> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ± 182.581 
> | ns/op |
> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ± 435.933  | 
> ns/op |
> 
> 
> **With** the new StringLatin1 indexOf(char) intrinsic:
> 
> | Benchmark  | Mode | Cnt | Score | Error | Units |
> | - | - |- |- |- 
> |- |
> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ± 407.716  
> | ns/op |
> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ± 167.306  | 
> ns/op |
> 
> 
> The objective of the patch is to bring the performance of StringLatin1 
> indexOf(char) in line with StringUTF16
> indexOf(char) for x86 and ARM64. We can see above that this has been 
> achieved. Similar results were obtained when
> running on ARM.

Jason Tatton has updated the pull request with a new target base due to a merge 
or a rebase. The pull request now
contains four commits:

 - Merge master
 - 8173585: further whitespace changes required by jcheck
 - JDK-8173585 - whitespace changes required by jcheck
 - JDK-8173585

-

Changes: https://git.openjdk.java.net/jdk/pull/71/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=71=01
  Stats: 523 lines in 16 files changed: 506 ins; 0 del; 17 mod
  Patch: https://git.openjdk.java.net/jdk/pull/71.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/71/head:pull/71

PR: https://git.openjdk.java.net/jdk/pull/71


Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char)

2020-09-18 Thread Jason Tatton
On Thu, 17 Sep 2020 18:20:08 GMT, Jason Tatton 
 wrote:

>> Hi Andrew,
>> 
>> The current indexOf char intrinsics for StringUTF16 and the new one here for 
>> StringLatin1 both use the AVX2 – i.e. 256
>> bit instructions, these are also affected by the frequency scaling which 
>> affects the AVX-512 instructions you pointed
>> out. Of course in a world where all the work taking place is AVX 
>> instructions this wouldn’t be an issue but in mixed
>> mode execution this is a problem.  However, the compiler does have knowledge 
>> of the capability of the CPU upon which
>> it’s optimizing code for and is able to decide whether to use AVX 
>> instructions if they are supported by the CPU AND if
>> it wouldn’t be detrimental for performance. In fact, there is a flag which 
>> one can use to interact with
>> this: -XX:UseAVX=version.  This of course made testing this patch an 
>> interesting experience as the AVX2 instructions
>> were not enabled on the Xeon processors which I had access to at AWS, but in 
>> the end I was able to use an i7 on my
>> corporate macbook to validate the code.  From: mlbridge[bot] 
>>  Sent: 11 September 2020 17:01
>> To: openjdk/jdk  Cc: Tatton, Jason 
>> ; Mention 
>> Subject: Re: [openjdk/jdk] 8173585: Intrinsify StringLatin1.indexOf(char) 
>> (#71)
>> 
>> 
>> Mailing list message from Andrew Haley on 
>> hotspot-dev:
>> 
>> On 11/09/2020 11:23, Jason Tatton wrote:
>> 
>> For the x86 implementation there may be two further improvements we
>> can make in order to improve performance of both the StringUTF16 and
>> StringLatin1 indexOf(char) intrinsics:
>> 
>> 1. Make use of AVX-512 instructions.
>> 
>> Is this really a good idea?
>> 
>> When the processor detects Intel AVX instructions, additional
>> voltage is applied to the core. With the additional voltage applied,
>> the processor could run hotter, requiring the operating frequency to
>> be reduced to maintain operations within the TDP limits. The higher
>> voltage is maintained for 1 millisecond after the last Intel AVX
>> instruction completes, and then the voltage returns to the nominal
>> TDP voltage level.
>> 
>> https://computing.llnl.gov/tutorials/linux_clusters/intelAVXperformanceWhitePaper.pdf
>> 
>> So, if StringLatin1.indexOf(char) executes enough to make a difference
>> to any real-world program, the effect may well be to slow down the clock
>> for all of the code that does not use AVX.
>> 
>> 2. For ?short? Strings (see below), I think it may be possible to modify the 
>> existing algorithm to still use SSE SIMD
>> instructions instead of a loop.
>> 
>> --
>> Andrew Haley (he/him)
>> Java Platform Lead Engineer
>> Red Hat UK Ltd. 
>> https://keybase.io/andrewhaley
>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
>> 
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on 
>> GitHub, or
>> unsubscribe.
>
> Hi everyone,
> 
> This patch is just missing a couple of reviewers... Please can someone step 
> forward?
> 
> I think it's a fairly straightforward change.
> 
> -Jason

Hi Andrew,

Thanks for coming back to me. Looking on the github 
[PR](https://github.com/openjdk/jdk/pull/71) nobody is tagged as a
reviewer for this (perhaps this is a feature which is not being used).

> That's
> because what is actually missing is the justification he asked for. As
> Andrew pointed out the change is simple but the reason for implementing
> it is not.

There are two separate things here:
1). Justification for the change itself:
-The objective and justification for this patch is to bring the performance of 
StringLatin1 indexOf(char) in line with
 StringUTF16 indexOf(char) for x86 and ARM64. This solves the problem as raised 
in
 [JDK-8173585](https://bugs.openjdk.java.net/browse/JDK-8173585), and also on 
the [mailing
 
list](http://mail.openjdk.java.net/pipermail/jdk9-dev/2017-January/005539.html).

2). Discussion around future enhancements - concerning potential use of AVX-512 
instructions and a more optimal
implementation for short strings.
-These would be separate JBS's I'm not advocating for/against this, they are 
just ideas separate from this JBS.

-

PR: https://git.openjdk.java.net/jdk/pull/71


Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char)

2020-09-18 Thread Andrew Dinn
On 17/09/2020 19:22, Jason Tatton wrote:
> This patch is just missing a couple of reviewers... Please can someone step 
> forward?
> 
> I think it's a fairly straightforward change.
I believe you got a review from Andrew Haley -- it was quoted in your
follow-up from which I selected the above response.

What you did not get was license to proceed and push this change. That's
because what is actually missing is the justification he asked for. As
Andrew pointed out the change is simple but the reason for implementing
it is not.

regards,


Andrew Dinn
---
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill



Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char)

2020-09-17 Thread Jason Tatton
On Fri, 11 Sep 2020 23:04:01 GMT, Jason Tatton 
 wrote:

>> This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 
>> byte encoded Strings). It is provided for
>> x86 and ARM64. The implementation is greatly inspired by the indexOf(char) 
>> intrinsic for StringUTF16. To incorporate it
>> I had to make a small change to StringLatin1.java (refactor of functionality 
>> to intrisified private method) as well as
>> code for C2.  Submitted to: hotspot-compiler-dev and core-libs-dev as this 
>> patch contains a change to hotspot and
>> java/lang/StringLatin1.java  https://bugs.openjdk.java.net/browse/JDK-8173585
>> 
>> Details of testing:
>> 
>> I have created a jtreg test 
>> “compiler/intrinsics/string/TestStringLatin1IndexOfChar” to cover this new 
>> intrinsic. Note
>> that, particularly for the x86 implementation of the intrinsic, the code 
>> path taken is dependent upon the length of the
>> input String. Hence the test has been designed to cover all these cases. In 
>> summary they are:
>> - A “short” string of < 16 characters.
>> - A SIMD String of 16 – 31 characters.
>> - A AVX2 SIMD String of 32 characters+.
>> 
>> Hardware used for testing:
>> -
>> 
>> - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) 
>> • Intel i7 processor (with AVX2 support).
>> - AWS Graviton 2 (ARM 64 processor).
>> 
>> I also ran; ‘run-test-tier1’ and ‘run-test-tier2’ for: x86_64 and aarch64.
>> 
>> Possible future enhancements:
>> 
>> For the x86 implementation there may be two further improvements we can make 
>> in order to improve performance of both
>> the StringUTF16 and StringLatin1 indexOf(char) intrinsics:
>> 1. Make use of AVX-512 instructions.
>> 2. For “short” Strings (see below), I think it may be possible to modify the 
>> existing algorithm to still use SSE SIMD
>> instructions instead of a loop.
>> Benchmark results:
>> 
>> **Without** the new StringLatin1 indexOf(char) intrinsic:
>> 
>> | Benchmark  | Mode | Cnt | Score | Error | Units |
>> | - | - |- |- |- 
>> |- |
>> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ± 182.581 
>> | ns/op |
>> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ± 435.933  | 
>> ns/op |
>> 
>> 
>> **With** the new StringLatin1 indexOf(char) intrinsic:
>> 
>> | Benchmark  | Mode | Cnt | Score | Error | Units |
>> | - | - |- |- |- 
>> |- |
>> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ± 407.716 
>>  | ns/op |
>> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ± 167.306  | 
>> ns/op |
>> 
>> 
>> The objective of the patch is to bring the performance of StringLatin1 
>> indexOf(char) in line with StringUTF16
>> indexOf(char) for x86 and ARM64. We can see above that this has been 
>> achieved. Similar results were obtained when
>> running on ARM.
>
> Hi Andrew,
> 
> The current indexOf char intrinsics for StringUTF16 and the new one here for 
> StringLatin1 both use the AVX2 – i.e. 256
> bit instructions, these are also affected by the frequency scaling which 
> affects the AVX-512 instructions you pointed
> out. Of course in a world where all the work taking place is AVX instructions 
> this wouldn’t be an issue but in mixed
> mode execution this is a problem.  However, the compiler does have knowledge 
> of the capability of the CPU upon which
> it’s optimizing code for and is able to decide whether to use AVX 
> instructions if they are supported by the CPU AND if
> it wouldn’t be detrimental for performance. In fact, there is a flag which 
> one can use to interact with
> this: -XX:UseAVX=version.  This of course made testing this patch an 
> interesting experience as the AVX2 instructions
> were not enabled on the Xeon processors which I had access to at AWS, but in 
> the end I was able to use an i7 on my
> corporate macbook to validate the code.  From: mlbridge[bot] 
>  Sent: 11 September 2020 17:01
> To: openjdk/jdk  Cc: Tatton, Jason 
> ; Mention 
> Subject: Re: [openjdk/jdk] 8173585: Intrinsify StringLatin1.indexOf(char) 
> (#71)
> 
> 
> Mailing list message from Andrew Haley on 
> hotspot-dev:
> 
> On 11/09/2020 11:23, Jason Tatton wrote:
> 
> For the x86 implementation there may be two further improvements we
> can make in order to improve performance of both the StringUTF16 and
> StringLatin1 indexOf(char) intrinsics:
> 
> 1. Make use of AVX-512 instructions.
> 
> Is this really a good idea?
> 
> When the processor detects Intel AVX instructions, additional
> voltage is applied to the core. With the additional voltage applied,
> the processor could run hotter, requiring the operating frequency to
> be reduced to maintain operations within the TDP limits. The higher
> voltage is 

Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char)

2020-09-14 Thread Andrew Haley
On 12/09/2020 00:06, Jason Tatton wrote:

> The current indexOf char intrinsics for StringUTF16 and the new one
> here for StringLatin1 both use the AVX2 – i.e. 256 bit instructions,
> these are also affected by the frequency scaling which affects the
> AVX-512 instructions you pointed out. Of course in a world where all
> the work taking place is AVX instructions this wouldn’t be an issue
> but in mixed mode execution this is a problem.

Right.

> However, the compiler does have knowledge of the capability of the
> CPU upon which it’s optimizing code for and is able to decide
> whether to use AVX instructions if they are supported by the CPU AND
> if it wouldn’t be detrimental for performance. In fact, there is a
> flag which one can use to interact with this: -XX:UseAVX=version.

Sure. But the question to be answered is this: we can use AVX for
this, but should we? "It's there, so we should use it" isn't sound
reasoning.

My guess is that there are no (non-FP) workloads where AVX dominates.
And that therefore throwing AVX at trivial jobs may make our programs
slower; this is a pessimization.

I have to emphahsize that I don't know the answer to this question.

> This of course made testing this patch an interesting experience as
> the AVX2 instructions were not enabled on the Xeon processors which
> I had access to at AWS, but in the end I was able to use an i7 on my
> corporate macbook to validate the code.

So: is this worth doing? Are there workloads where this helps? Where
this hinders? This is the kind of systems thinking we should be doing
when optimizing.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. 
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671



Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char)

2020-09-11 Thread Jason Tatton
On Tue, 8 Sep 2020 11:59:36 GMT, Jason Tatton 
 wrote:

> This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 
> byte encoded Strings). It is provided for
> x86 and ARM64. The implementation is greatly inspired by the indexOf(char) 
> intrinsic for StringUTF16. To incorporate it
> I had to make a small change to StringLatin1.java (refactor of functionality 
> to intrisified private method) as well as
> code for C2.  Submitted to: hotspot-compiler-dev and core-libs-dev as this 
> patch contains a change to hotspot and
> java/lang/StringLatin1.java  https://bugs.openjdk.java.net/browse/JDK-8173585
> 
> Details of testing:
> 
> I have created a jtreg test 
> “compiler/intrinsics/string/TestStringLatin1IndexOfChar” to cover this new 
> intrinsic. Note
> that, particularly for the x86 implementation of the intrinsic, the code path 
> taken is dependent upon the length of the
> input String. Hence the test has been designed to cover all these cases. In 
> summary they are:
> - A “short” string of < 16 characters.
> - A SIMD String of 16 – 31 characters.
> - A AVX2 SIMD String of 32 characters+.
> 
> Hardware used for testing:
> -
> 
> - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) 
> • Intel i7 processor (with AVX2 support).
> - AWS Graviton 2 (ARM 64 processor).
> 
> I also ran; ‘run-test-tier1’ and ‘run-test-tier2’ for: x86_64 and aarch64.
> 
> Possible future enhancements:
> 
> For the x86 implementation there may be two further improvements we can make 
> in order to improve performance of both
> the StringUTF16 and StringLatin1 indexOf(char) intrinsics:
> 1. Make use of AVX-512 instructions.
> 2. For “short” Strings (see below), I think it may be possible to modify the 
> existing algorithm to still use SSE SIMD
> instructions instead of a loop.
> Benchmark results:
> 
> **Without** the new StringLatin1 indexOf(char) intrinsic:
> 
> | Benchmark  | Mode | Cnt | Score | Error | Units |
> | - | - |- |- |- 
> |- |
> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ± 182.581 
> | ns/op |
> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ± 435.933  | 
> ns/op |
> 
> 
> **With** the new StringLatin1 indexOf(char) intrinsic:
> 
> | Benchmark  | Mode | Cnt | Score | Error | Units |
> | - | - |- |- |- 
> |- |
> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ± 407.716  
> | ns/op |
> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ± 167.306  | 
> ns/op |
> 
> 
> The objective of the patch is to bring the performance of StringLatin1 
> indexOf(char) in line with StringUTF16
> indexOf(char) for x86 and ARM64. We can see above that this has been 
> achieved. Similar results were obtained when
> running on ARM.

Hi Andrew,

The current indexOf char intrinsics for StringUTF16 and the new one here for 
StringLatin1 both use the AVX2 – i.e. 256
bit instructions, these are also affected by the frequency scaling which 
affects the AVX-512 instructions you pointed
out. Of course in a world where all the work taking place is AVX instructions 
this wouldn’t be an issue but in mixed
mode execution this is a problem.

However, the compiler does have knowledge of the capability of the CPU upon 
which it’s optimizing code for and is able
to decide whether to use AVX instructions if they are supported by the CPU AND 
if it wouldn’t be detrimental for
performance. In fact, there is a flag which one can use to interact with this: 
-XX:UseAVX=version.

This of course made testing this patch an interesting experience as the AVX2 
instructions were not enabled on the Xeon
processors which I had access to at AWS, but in the end I was able to use an i7 
on my corporate macbook to validate the
code.

From: mlbridge[bot] 
Sent: 11 September 2020 17:01
To: openjdk/jdk 
Cc: Tatton, Jason ; Mention 
Subject: Re: [openjdk/jdk] 8173585: Intrinsify StringLatin1.indexOf(char) (#71)


Mailing list message from Andrew Haley on 
hotspot-dev:

On 11/09/2020 11:23, Jason Tatton wrote:

For the x86 implementation there may be two further improvements we
can make in order to improve performance of both the StringUTF16 and
StringLatin1 indexOf(char) intrinsics:

1. Make use of AVX-512 instructions.

Is this really a good idea?

When the processor detects Intel AVX instructions, additional
voltage is applied to the core. With the additional voltage applied,
the processor could run hotter, requiring the operating frequency to
be reduced to maintain operations within the TDP limits. The higher
voltage is maintained for 1 millisecond after the last Intel AVX
instruction completes, and then the voltage returns to the nominal
TDP voltage level.


Re: RFR: 8173585: Intrinsify StringLatin1.indexOf(char)

2020-09-11 Thread Andrew Haley
On 11/09/2020 11:23, Jason Tatton wrote:

> For the x86 implementation there may be two further improvements we
> can make in order to improve performance of both the StringUTF16 and
> StringLatin1 indexOf(char) intrinsics:
>
> 1. Make use of AVX-512 instructions.

Is this really a good idea?

  When the processor detects Intel AVX instructions, additional
  voltage is applied to the core. With the additional voltage applied,
  the processor could run hotter, requiring the operating frequency to
  be reduced to maintain operations within the TDP limits. The higher
  voltage is maintained for 1 millisecond after the last Intel AVX
  instruction completes, and then the voltage returns to the nominal
  TDP voltage level.

  
https://computing.llnl.gov/tutorials/linux_clusters/intelAVXperformanceWhitePaper.pdf

So, if StringLatin1.indexOf(char) executes enough to make a difference
to any real-world program, the effect may well be to slow down the clock
for all of the code that does not use AVX.

> 2. For “short” Strings (see below), I think it may be possible to modify the 
> existing algorithm to still use SSE SIMD
> instructions instead of a loop.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. 
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671



RFR: 8173585: Intrinsify StringLatin1.indexOf(char)

2020-09-11 Thread Jason Tatton
This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 
byte encoded Strings). It is provided for
x86 and ARM64. The implementation is greatly inspired by the indexOf(char) 
intrinsic for StringUTF16. To incorporate it
I had to make a small change to StringLatin1.java (refactor of functionality to 
intrisified private method) as well as
code for C2.

Submitted to: hotspot-compiler-dev and core-libs-dev as this patch contains a 
change to hotspot and
java/lang/StringLatin1.java

https://bugs.openjdk.java.net/browse/JDK-8173585

Details of testing:

I have created a jtreg test 
“compiler/intrinsics/string/TestStringLatin1IndexOfChar” to cover this new 
intrinsic. Note
that, particularly for the x86 implementation of the intrinsic, the code path 
taken is dependent upon the length of the
input String. Hence the test has been designed to cover all these cases. In 
summary they are:

- A “short” string of < 16 characters.
- A SIMD String of 16 – 31 characters.
- A AVX2 SIMD String of 32 characters+.

Hardware used for testing:
-

- Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) • 
Intel i7 processor (with AVX2 support).
- AWS Graviton 2 (ARM 64 processor).

I also ran; ‘run-test-tier1’ and ‘run-test-tier2’ for: x86_64 and aarch64.

Possible future enhancements:

For the x86 implementation there may be two further improvements we can make in 
order to improve performance of both
the StringUTF16 and StringLatin1 indexOf(char) intrinsics:

1. Make use of AVX-512 instructions.
2. For “short” Strings (see below), I think it may be possible to modify the 
existing algorithm to still use SSE SIMD
instructions instead of a loop.

Benchmark results:

**Without** the new StringLatin1 indexOf(char) intrinsic:

| Benchmark  | Mode | Cnt | Score | Error | Units |
| - | - |- |- |- 
|- |
| IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ± 182.581 | 
ns/op |
| IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ± 435.933  | 
ns/op |


**With** the new StringLatin1 indexOf(char) intrinsic:

| Benchmark  | Mode | Cnt | Score | Error | Units |
| - | - |- |- |- 
|- |
| IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ± 407.716  | 
ns/op |
| IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ± 167.306  | 
ns/op |


The objective of the patch is to bring the performance of StringLatin1 
indexOf(char) in line with StringUTF16
indexOf(char) for x86 and ARM64. We can see above that this has been achieved. 
Similar results were obtained when
running on ARM.

-

Commit messages:
 - 8173585: further whitespace changes required by jcheck
 - JDK-8173585 - whitespace changes required by jcheck
 - JDK-8173585

Changes: https://git.openjdk.java.net/jdk/pull/71/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=71=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8173585
  Stats: 522 lines in 15 files changed: 506 ins; 0 del; 16 mod
  Patch: https://git.openjdk.java.net/jdk/pull/71.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/71/head:pull/71

PR: https://git.openjdk.java.net/jdk/pull/71