On Sat, 29 Nov 2025 04:55:44 GMT, Jiangli Zhou <[email protected]> wrote:
>> Oh man, the `pos`/`len` modifications in current code are confusing. I >> scratched my head for quite a while trying to comprehend why does `__ >> bind(MESG_BELOW_32_BLKS)` split the `pos += 16` and `len -= 16`? On a >> surface, that just looks like a bug. >> >> But looks that way because we do `initial_blocks_16_avx512` twice, do `pos >> += 16` twice, but only do the `len += 32` after the second call. Which does >> not help if we shortcut after the first call. In fact, I am not at all sure >> that checking `len < 32` _without_ modifying `len` beforehand does not break >> the assumptions downstream: >> >> >> initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, >> CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, >> SHUF_MASK, stack_offset); >> __ addl(pos, 16 * 16); >> __ cmpl(len, 32 * 16); >> __ jcc(Assembler::below, MESG_BELOW_32_BLKS); >> >> >> Really, in these kind of intrinsics, _you want_ to make sure `pos` and `len` >> updates are tightly bound together, otherwise these kinds of mistakes would >> keep happening. You will lose on code density a bit, but would have more >> readable and robust code. >> >> Shouldn't it be like this? >> >> >> diff --git a/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp >> b/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp >> index 1e728ffa279..a16e25b075d 100644 >> --- a/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp >> +++ b/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp >> @@ -3475,12 +3475,14 @@ void StubGenerator::aesgcm_avx512(Register in, >> Register len, Register ct, Regist >> >> initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, >> CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, >> SHUF_MASK, stack_offset); >> __ addl(pos, 16 * 16); >> + __ subl(len, 16 * 16); >> + >> __ cmpl(len, 32 * 16); >> __ jcc(Assembler::below, MESG_BELOW_32_BLKS); >> >> initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, >> CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, >> SHUF_MASK, stack_offset + 16); >> __ addl(pos, 16 * 16); >> - __ subl(len, 32 * 16); >> + __ subl(len, 16 * 16); >> >> __ cmpl(len, 32 * 16); >> __ jcc(Assembler::below, NO_BIG_BLKS); >> @@ -3491,24 +3493,27 @@ void StubGenerator::aesgcm_avx512(Register in, >> Register len, Register ct, Regist >> ghash16_encrypt_parallel16_avx512(in, out, ct, pos, avx512_subkeyHtbl, >> CTR_CHECK, rounds, key, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, >> ADD_1234, SHUF_MASK, >> ... > >> Oh man, the `pos`/`len` modifications in current code are confusing. I >> scratched my head for quite a while trying to comprehend why does `__ >> bind(MESG_BELOW_32_BLKS)` split the `pos += 16` and `len -= 16`? On a >> surface, that just looks like a bug. > > The combination of handling for the fall through from `ENCRYPT_16_BLKS` and > conditional entry to `MESG_BELOW_32_BLKS` cases are subtle. > > I had also missed the fall through case in my initial proposed fix (with > comp/jcc) until @sviswa7 pointed it out and suggested the current fix. The > fix for `StubGenerator::aesgcm_avx512` with moving `__ addl(pos, 16 * 16)` to > be before `__ bind(MESG_BELOW_32_BLKS)` works correctly for both the fall > through and conditional jump cases now. > >> >> But looks that way because we do `initial_blocks_16_avx512` twice, do `pos >> += 16` twice, but only do the `len += 32` after the second call. Which does >> not help if we shortcut after the first call. In fact, I am not at all sure >> that checking `len < 32` _without_ modifying `len` beforehand does not break >> the assumptions downstream: >> >> ``` >> initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, >> CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, >> SHUF_MASK, stack_offset); >> __ addl(pos, 16 * 16); >> __ cmpl(len, 32 * 16); >> __ jcc(Assembler::below, MESG_BELOW_32_BLKS); >> ``` >> >> Really, in these kind of intrinsics, _you want_ to make sure `pos` and `len` >> updates are tightly bound together, otherwise these kinds of mistakes would >> keep happening. You will lose on code density a bit, but would have more >> readable and robust code. >> >> Shouldn't it be like this? >> >> ``` >> diff --git a/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp >> b/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp >> index 1e728ffa279..a16e25b075d 100644 >> --- a/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp >> +++ b/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp >> @@ -3475,12 +3475,14 @@ void StubGenerator::aesgcm_avx512(Register in, >> Register len, Register ct, Regist >> >> initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, >> CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, >> SHUF_MASK, stack_offset); >> __ addl(pos, 16 * 16); >> + __ subl(len, 16 * 16); >> + >> __ cmpl(len, 32 * 16); >> __ jcc(Assembler::below, MESG_BELOW_32_BLKS); >> >> initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, >> CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, >> ... > @jianglizhou thank you for the AVX2 related output from the unit test > pre-fix. From this I was able to trace the point of failure and see that your > proposed changes are good for approval. Thank you for your work on these > issues! @smemery Thanks for carefully testing the changes! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28363#issuecomment-3590993560
