On Sat, 10 Jan 2026 07:11:48 GMT, Shawn M Emery <[email protected]> wrote:
>> I will run the micro benchmark on AMD Turin and report back by early next >> week. > >> Better to align loop sarting address to OptoLoopAlignment > > For parity, should I do this for the other labels in the file as well? > >> I will run the micro benchmark on AMD Turin and report by back early next >> week. > > That would be great, thank you for doing this! Just a note on LoopAlignment, there are multiple moving parts here, first aligning starting addresses of loop to 64 ([recommendation from Zen5 optimization guide](https://docs.amd.com/v/u/en-US/58455_1.00) section 2.8.3) ensure small loop bodies are not split-across the cache line, if that happens then there is a code entry penalty since for first iteration of loop front-end will have to read multiple L1I cachelines, once its decoded and uops are part of Op-cache (AMD) or DSB (Intel) then uops stream for successive loop iterations are emitted from op-cache. Since op-cache is shared b/w 2 HW threads in SMT configuration hence in case of noisy neighbor scenarios or context-switches we may hit code-entry penalty during lifetime of loop. So its advisable to add alignment in this case for other labels before loops we already have OptoLoopAlignment in place. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2679380724
