Re: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI [v4]

Jatin Bhateja Sun, 11 Jan 2026 01:56:46 -0800

On Sat, 10 Jan 2026 07:11:48 GMT, Shawn M Emery <[email protected]> wrote:


>> I will run the micro benchmark on AMD Turin and report back by early next 
>> week.
>
>> Better to align loop sarting address to OptoLoopAlignment
> 
> For parity, should I do this for the other labels in the file as well?
> 
>> I will run the micro benchmark on AMD Turin and report by back early next 
>> week.
> 
> That would be great, thank you for doing this!

Just a note on LoopAlignment, there are multiple moving parts here, first 
aligning starting addresses of loop to 64 ([recommendation from Zen5 
optimization guide](https://docs.amd.com/v/u/en-US/58455_1.00) section 2.8.3) 
ensure small loop bodies are not split-across the cache line, if that happens 
then there is a code entry penalty since for first iteration of loop front-end 
will have to read multiple L1I cachelines, once its decoded and uops are part 
of Op-cache (AMD) or DSB (Intel) then uops stream for successive loop 
iterations are emitted from op-cache. Since op-cache is shared b/w 2 HW threads 
in SMT configuration hence in case of noisy neighbor scenarios or 
context-switches we may hit code-entry penalty during lifetime of loop. 

So its advisable to add alignment in this case for other labels before loops we 
already have OptoLoopAlignment in place.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2679380724

Re: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI [v4]

Reply via email to