On Mon, 24 Nov 2025 16:28:44 GMT, Mark Powers <[email protected]> wrote:
> SignatureBench.MLDSA with `+UseDilithiumIntrinsics` shows an average 1.61%
> improvement across all algorithms and data sizes. Measuring
> SignatureBench.MLDSA against a baseline build without the fix, shows an
> average 2.24% improvement across all algorithms and data sizes.
Need bit of clarification.. (I think you are saying there is a regression?).
- `+UseDilithiumIntrinsics` should be redundant (i.e. `vm_version_x86.cpp`
should automatically detect and turn the feature on).
- So if I read correctly.. the baseline measured is already has the
original intrinsics (implicitly) enabled..
- therefore there is a 2.24% noise in the benchmark?
In my measurements for AVX512 parts, I had seen between 0%->6% across
`SignatureBench.MLDSA`
- (some variation on desktop-vs-server parts..)
- `SignatureBench.MLDSA.verify` was worse, only 0->2% depending on keysize
(iirc, bigger portion of benchmark was in SHA3 instead)
- `SignatureBench.MLDSA.sign` was better, 4-6% (also depending on datasize)
That is also why I had included the other (deleted) microbenchmark..
`SignatureBench.MLDSA` has a lot of 'other things' (e.g. SHA3) also happening,
so the AVX512 intrinsic changes were harder to differentiate from noise..
- I had measured ~25%-50% improvement on purely the 5 intrinsics changed..
Hence the claim 'never worse'.. A more precise claim..:
- "New intrinsics seem to be better, but (at least for AVX512) existing
intrinsics were already plenty good for MLDSA"
-------------
PR Comment: https://git.openjdk.org/jdk/pull/28136#issuecomment-3571871477