On Wed, 7 Jan 2026 00:18:43 GMT, Volodymyr Paprotski <[email protected]> wrote:
> "Insert 0b0000 nibble after every third nibble". I only have two questions, > looks good otherwise. Yes, that is the idea. > > PS: things I've considered: > > * Loop controls? > > * ML_KEM.java guarantees (per callee comment and assert) lengths are > multiple of 64 > * also same as original code > * Why not simply a vpermb? Have zeroes already from the masked load with k1.. It *is* using vpermb (evpermb() generates the EVEX encoded VPERMB) > > * shuffle granularity is actually 4-bits, not 8-bits Really? In what instruction? I hadn't found it in the manual. > * logical shift already zeroes top bits, so `vpand` not required? Only every 2nd byte is shifted, the rest needs to be masked. > > * odd columns not shifted, so still have extra bits that need clearing Yes, that is what the vpand does. (actually, it also (unnecessarily) masks the shifted bytes. > * Why VBMI? > > * needed for `evpermb` Yes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28815#issuecomment-3718842604
