On Wed, 13 May 2026 12:22:54 GMT, Andrew Dinn <[email protected]> wrote:
>> I remember removing this because "this function is a local (static)
>> function, essentially a private helper in this file.. it is only ever called
>> here.. no need to tripple-check every parameter.. and it will just add
>> complexity.. " (changed my mind!)
>> ```C++
>> void StubGenerator::generate_sha3_stubs() {
>> bool avx512Available = VM_Version::supports_evex() &&
>> VM_Version::supports_avx512bw();
>> if (UseSHA3Intrinsics) {
>> if (avx512Available) {
>> StubRoutines::_sha3_implCompress =
>>
>> generate_sha3_implCompress_avx512(StubId::stubgen_sha3_implCompress_id,
>> this, _masm);
>> StubRoutines::_sha3_implCompressMB =
>>
>> generate_sha3_implCompress_avx512(StubId::stubgen_sha3_implCompressMB_id,
>> this, _masm);
>> StubRoutines::_double_keccak =
>> generate_sha3_implCompress_avx512(StubId::stubgen_double_keccak_id,
>> this, _masm);
>> StubRoutines::_quad_keccak =
>> generate_sha3_implCompress_avx512(StubId::stubgen_quad_keccak_id,
>> this, _masm);
>> } else {
>> StubRoutines::_sha3_implCompress =
>>
>> generate_sha3_implCompress_avx2(StubId::stubgen_sha3_implCompress_id, this,
>> _masm);
>> StubRoutines::_double_keccak =
>> generate_sha3_implCompress_avx2(StubId::stubgen_double_keccak_id,
>> this, _masm);
>> StubRoutines::_sha3_implCompressMB =
>>
>> generate_sha3_implCompress_avx2(StubId::stubgen_sha3_implCompressMB_id,
>> this, _masm);
>> }
>> }
>> }
>>
>>
>> but.. I compacted your version to just:
>>
>> ```C++
>> switch(stub_id) {
>> case StubId::stubgen_sha3_implCompress_id:
>> case StubId::stubgen_sha3_implCompressMB_id:
>> case StubId::stubgen_double_keccak_id:
>> case StubId::stubgen_quad_keccak_id:
>> break;
>> default:
>> ShouldNotReachHere();
>> }
>>
>>
>> In fact.. its 'self-documentation' on ways to call this helper.. works for
>> me, thanks!
>
> The more important point was defensive programming - errors do get injected
> during maintenance and the suggested pattern catches them (for me
> maintainability is always the ace trump). However, you are right that this
> also serves to state the intended use of the generator, another valuable
> benefit to maintainers. But there is also a performance consideration to
> address.
>
> This is one example of a generator that produces code for more than one stub.
> For such generators a single up front switch, prior to calling
> `load_archive_data`, is normally used to verify the stub id passed as
> argument and also to initialize stub-dependent local state. The reason is
> that this serves to minimize the amount of branching in the resulting
> compiled generator code. If you look at other multi-stub generators (see
> e.g. the array copy stubs) you should find that the same code pattern has
> been followed consistently.
>
> Of course that does imply a small amount of unnecessary work is done in the
> case where `load_archive_data` returns a non-null address and we don't use
> the state. The trade-off is
> - redundant execution of code that usually does no more than install a few
> zero or small constant values in registers/stack slots
>
> vs
> - postponing that same execution until we can be sure `load_archive_data`
> has returned `nullptr` but then incurring the extra cost of either a repeated
> switch or one or more compare and branch tests to perform the same setup.
>
> Your latest patch has selected that second alternative contrary to what is
> done elsewhere.
>
> I'll also note here that one could try to finesse this unnecessary cost by
> retaining the code pattern you have chosen modulo placing the first
> verification switch inside an `ifdef ASSERT`/`#endif` pair. That would only
> cost us extra in a debug build. However, that leaves a maintainer with more
> code to read and any subsequent reviewer with more code to check.
> Specifically, we have two places we need to update if we should extend the
> multi-stub generator to include a new case and that's a potential bug
> injection opportunity. You might wonder how often this sort of upgrade
> happens but . . . well, here we are.
>
> So, to sum up, my request may have seemed to have been based on a somewhat
> arbitrary and inconsequential question of coding style. In reality it was a
> choice that was considered before this PR was created and one that was
> applied (I hope) consistently across the code base. I'm not going to force
> you to follow the pattern employed elsewhere if ...
Thanks for taking the time to give me more background. And I am entirely
willing to change the code; code style is an absolutely valid reason (and I
have been in the position of not quite being able to articulate 'something
about this is off, but everything I say sounds minor' before, so feel free to..
give me more direction :) )
Also.. reading between the lines.. I think the framing of what we are
optimizing for differs.. I tend(ed) to think about the intrinsic as
C++-vs-asm.. ASM will 'run forever', "C++ runs once-or-thrice", prioritize
readability over 'going crazy over generating the most meta-templated version
of the C++ code". You are looking at AOT, and if everyone 'suffers' from same
approach as me, it adds up to the perennial java complaint of 'slow startup'..
entirely fair..
But because I don't have the intuition for the bigger picture, I think I might
need a more concrete example of what I am doing wrong? I think.. I now have
two switch statements on the (stubid) same value (I had considered merging the
two.. fairly trivial..). I figured this current version is separating 'argument
checking' vs 'code logic'.. but refactoring into 'single switch' is just
something that will take few 10-minutes, so I can certainly rearange things a
bit, no big deal. Only difficulty is that I had started to emit code in the
switch, but this is a fun puzzle of rearanging code while keeping it 'pretty'..
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/31125#discussion_r3236121089