Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v3]
On Thu, 2 Jun 2022 01:29:54 GMT, Xiaohong Gong wrote: >> Checking whether the indexes of masked lanes are inside of the valid memory >> boundary is necessary for masked vector memory access. However, this could >> be saved if the given offset is inside of the vector range that could make >> sure no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs >> have saved this kind of check for common cases. And this patch did the >> similar optimization for the masked vector store. >> >> The performance for the new added store masked benchmarks improves about >> `1.83x ~ 2.62x` on a x86 system: >> >> Benchmark BeforeAfter Gain Units >> StoreMaskedBenchmark.byteStoreArrayMask 12757.936 23291.118 1.826 ops/ms >> StoreMaskedBenchmark.doubleStoreArrayMask 1520.932 3921.616 2.578 ops/ms >> StoreMaskedBenchmark.floatStoreArrayMask 2713.031 7122.535 2.625 ops/ms >> StoreMaskedBenchmark.intStoreArrayMask 4113.772 8220.206 1.998 ops/ms >> StoreMaskedBenchmark.longStoreArrayMask1993.986 4874.148 2.444 ops/ms >> StoreMaskedBenchmark.shortStoreArrayMask 8543.593 17821.086 2.086 ops/ms >> >> Similar performane gain can also be observed on ARM hardware. > > Xiaohong Gong has updated the pull request with a new target base due to a > merge or a rebase. The pull request now contains three commits: > > - Merge branch 'jdk:master' to JDK-8286279 > - Wrap the offset check into a static method > - 8286279: [vectorapi] Only check index of masked lanes if offset is out of > array boundary for masked store Marked as reviewed by psandoz (Reviewer). - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v3]
> Checking whether the indexes of masked lanes are inside of the valid memory > boundary is necessary for masked vector memory access. However, this could be > saved if the given offset is inside of the vector range that could make sure > no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs have > saved this kind of check for common cases. And this patch did the similar > optimization for the masked vector store. > > The performance for the new added store masked benchmarks improves about > `1.83x ~ 2.62x` on a x86 system: > > Benchmark BeforeAfter Gain Units > StoreMaskedBenchmark.byteStoreArrayMask 12757.936 23291.118 1.826 ops/ms > StoreMaskedBenchmark.doubleStoreArrayMask 1520.932 3921.616 2.578 ops/ms > StoreMaskedBenchmark.floatStoreArrayMask 2713.031 7122.535 2.625 ops/ms > StoreMaskedBenchmark.intStoreArrayMask 4113.772 8220.206 1.998 ops/ms > StoreMaskedBenchmark.longStoreArrayMask1993.986 4874.148 2.444 ops/ms > StoreMaskedBenchmark.shortStoreArrayMask 8543.593 17821.086 2.086 ops/ms > > Similar performane gain can also be observed on ARM hardware. Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge branch 'jdk:master' to JDK-8286279 - Wrap the offset check into a static method - 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store - Changes: https://git.openjdk.java.net/jdk/pull/8620/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=8620&range=02 Stats: 216 lines in 9 files changed: 179 ins; 0 del; 37 mod Patch: https://git.openjdk.java.net/jdk/pull/8620.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8620/head:pull/8620 PR: https://git.openjdk.java.net/jdk/pull/8620