On Wed, 7 Jan 2026 00:18:43 GMT, Volodymyr Paprotski <[email protected]> 
wrote:

> "Insert 0b0000 nibble after every third nibble". I only have two questions, 
> looks good otherwise.

Yes, that is the idea.

> 
> PS: things I've considered:
> 
> * Loop controls?
>   
>   * ML_KEM.java guarantees  (per callee comment and assert) lengths are 
> multiple of 64
>   * also same as original code
> * Why not simply a vpermb? Have zeroes already from the masked load with k1..

It *is* using vpermb (evpermb() generates the EVEX encoded VPERMB)

>   
>   * shuffle granularity is actually 4-bits, not 8-bits

Really? In what instruction? I hadn't found it in the manual.

> * logical shift already zeroes top bits, so `vpand` not required?

Only every 2nd byte is shifted, the rest needs to be masked.
>   
>   * odd columns not shifted, so still have extra bits that need clearing

Yes, that is what the vpand does. (actually, it also (unnecessarily) masks the 
shifted bytes.

> * Why VBMI?
>   
>   * needed for `evpermb`

Yes.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28815#issuecomment-3718842604

Reply via email to