On Mon, 24 Nov 2025 16:28:44 GMT, Mark Powers <[email protected]> wrote:

> SignatureBench.MLDSA with `+UseDilithiumIntrinsics` shows an average 1.61% 
> improvement across all algorithms and data sizes. Measuring 
> SignatureBench.MLDSA against a baseline build without the fix, shows an 
> average 2.24% improvement across all algorithms and data sizes.

Need bit of clarification.. (I think you are saying there is a regression?). 
- `+UseDilithiumIntrinsics` should be redundant (i.e. `vm_version_x86.cpp` 
should automatically detect and turn the feature on).
    - So if I read correctly.. the baseline measured is already has the 
original intrinsics (implicitly) enabled.. 
        - therefore there is a 2.24% noise in the benchmark?

In my measurements for AVX512 parts, I had seen between 0%->6% across 
`SignatureBench.MLDSA`
    - (some variation on desktop-vs-server parts..)
    - `SignatureBench.MLDSA.verify` was worse, only 0->2% depending on keysize 
(iirc, bigger portion of benchmark was in SHA3 instead)
    - `SignatureBench.MLDSA.sign` was better, 4-6% (also depending on datasize)

That is also why I had included the other (deleted) microbenchmark.. 
`SignatureBench.MLDSA` has a lot of 'other things' (e.g. SHA3) also happening, 
so the AVX512 intrinsic changes were harder to differentiate from noise..
    - I had measured ~25%-50% improvement on purely the 5 intrinsics changed..

Hence the claim 'never worse'.. A more precise claim..:
    - "New intrinsics seem to be better, but (at least for AVX512) existing 
intrinsics were already plenty good for MLDSA"

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28136#issuecomment-3571871477

Reply via email to