I was able to test on AWS graviton2 instances (aarch64), but only with jdk 15. The results show that the vectorized approach appears the best option, though long comparisons are also an improvement over baseline.
Based on this, I made a small change to ArrayUtil to, by default, use unsafe long comparisons for aarch64 for older JDKs. I attached an image of the summarized data, but here it is raw: BASELINE Benchmark (file) (preset) Mode Cnt Score Error Units XZCompressionBenchmark.compress ihe_ovly_pr.dcm 3 avgt 3 1.192 ± 0.005 ms/op XZCompressionBenchmark.compress ihe_ovly_pr.dcm 6 avgt 3 6.066 ± 0.023 ms/op XZCompressionBenchmark.compress image1.dcm 3 avgt 3 4125.464 ± 134.683 ms/op XZCompressionBenchmark.compress image1.dcm 6 avgt 3 8193.866 ± 205.916 ms/op XZCompressionBenchmark.compress large.xml 3 avgt 3 1438.101 ± 7.357 ms/op XZCompressionBenchmark.compress large.xml 6 avgt 3 11961.600 ± 38.130 ms/op Updated Benchmark (file) (preset) Mode Cnt Score Error Units XZCompressionBenchmark.compress_legacy ihe_ovly_pr.dcm 3 avgt 3 1.434 ± 0.007 ms/op XZCompressionBenchmark.compress_legacy ihe_ovly_pr.dcm 6 avgt 3 6.694 ± 0.006 ms/op XZCompressionBenchmark.compress_legacy image1.dcm 3 avgt 3 4236.741 ± 176.331 ms/op XZCompressionBenchmark.compress_legacy image1.dcm 6 avgt 3 8923.713 ± 715.574 ms/op XZCompressionBenchmark.compress_legacy large.xml 3 avgt 3 1399.139 ± 5.955 ms/op XZCompressionBenchmark.compress_legacy large.xml 6 avgt 3 15793.829 ± 169.169 ms/op XZCompressionBenchmark.compress_unsafe_long ihe_ovly_pr.dcm 3 avgt 3 1.341 ± 0.016 ms/op XZCompressionBenchmark.compress_unsafe_long ihe_ovly_pr.dcm 6 avgt 3 4.441 ± 0.012 ms/op XZCompressionBenchmark.compress_unsafe_long image1.dcm 3 avgt 3 4172.261 ± 41.783 ms/op XZCompressionBenchmark.compress_unsafe_long image1.dcm 6 avgt 3 7414.503 ± 123.315 ms/op XZCompressionBenchmark.compress_unsafe_long large.xml 3 avgt 3 1289.451 ± 10.420 ms/op XZCompressionBenchmark.compress_unsafe_long large.xml 6 avgt 3 11355.386 ± 68.198 ms/op XZCompressionBenchmark.compress_vh_int ihe_ovly_pr.dcm 3 avgt 3 1.343 ± 0.008 ms/op XZCompressionBenchmark.compress_vh_int ihe_ovly_pr.dcm 6 avgt 3 5.137 ± 0.016 ms/op XZCompressionBenchmark.compress_vh_int image1.dcm 3 avgt 3 4097.739 ± 162.179 ms/op XZCompressionBenchmark.compress_vh_int image1.dcm 6 avgt 3 7865.803 ± 127.912 ms/op XZCompressionBenchmark.compress_vh_int large.xml 3 avgt 3 1284.516 ± 26.837 ms/op XZCompressionBenchmark.compress_vh_int large.xml 6 avgt 3 12066.120 ± 106.382 ms/op XZCompressionBenchmark.compress_vh_long ihe_ovly_pr.dcm 3 avgt 3 1.330 ± 0.009 ms/op XZCompressionBenchmark.compress_vh_long ihe_ovly_pr.dcm 6 avgt 3 4.569 ± 0.205 ms/op XZCompressionBenchmark.compress_vh_long image1.dcm 3 avgt 3 4110.154 ± 182.196 ms/op XZCompressionBenchmark.compress_vh_long image1.dcm 6 avgt 3 7420.054 ± 330.954 ms/op XZCompressionBenchmark.compress_vh_long large.xml 3 avgt 3 1271.665 ± 10.160 ms/op XZCompressionBenchmark.compress_vh_long large.xml 6 avgt 3 11077.733 ± 64.546 ms/op XZCompressionBenchmark.compress_vh_vector ihe_ovly_pr.dcm 3 avgt 3 1.326 ± 0.016 ms/op XZCompressionBenchmark.compress_vh_vector ihe_ovly_pr.dcm 6 avgt 3 4.551 ± 0.085 ms/op XZCompressionBenchmark.compress_vh_vector image1.dcm 3 avgt 3 4084.482 ± 32.445 ms/op XZCompressionBenchmark.compress_vh_vector image1.dcm 6 avgt 3 7670.077 ± 343.810 ms/op XZCompressionBenchmark.compress_vh_vector large.xml 3 avgt 3 1274.196 ± 2.831 ms/op XZCompressionBenchmark.compress_vh_vector large.xml 6 avgt 3 10485.505 ± 43.182 ms/op
ArrayUtil.java
Description: Binary data