On Wed, 16 Nov 2022 23:16:14 GMT, Volodymyr Paprotski <d...@openjdk.org> wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 756: >> >>> 754: >>> 755: // Store R^8-R for later use >>> 756: __ evmovdquq(Address(rsp, 64*0), B0, Assembler::AVX_512bit); >> >> Could these vector spills be eliminated? I counted 8 spare zmm registers >> available across the vector loop (xmm7-xmm12, xmm30, xmm31). >> >> And here's what is explicitly used in `process256Loop`: >> >> D0 D1 = xmm2-xmm3 >> B0 B1 B2 B3 B4 B5 = xmm19-xmm24 >> TMP = xmm6 >> A0 A1 A2 A3 A4 A5 = xmm13-xmm18 >> R0 R1 R2 R1P R2P = xmm25-xmm29 >> T0 T1 T2 T3 T4 T5 = xmm0-xmm5 > > Interesting!! Let me try that! Done! PS: This find really was great! PPS: I also reordered the map alphabetically and counted in-order... it was just really bugging me!! ------------- PR: https://git.openjdk.org/jdk/pull/10582