On Wed, 7 Jan 2026 16:36:52 GMT, Volodymyr Paprotski <[email protected]> 
wrote:

>>> "Insert 0b0000 nibble after every third nibble". I only have two questions, 
>>> looks good otherwise.
>> 
>> Yes, that is the idea.
>> 
>>> 
>>> PS: things I've considered:
>>> 
>>> * Loop controls?
>>>   
>>>   * ML_KEM.java guarantees  (per callee comment and assert) lengths are 
>>> multiple of 64
>>>   * also same as original code
>>> * Why not simply a vpermb? Have zeroes already from the masked load with 
>>> k1..
>> 
>> It *is* using vpermb (evpermb() generates the EVEX encoded VPERMB)
>> 
>>>   
>>>   * shuffle granularity is actually 4-bits, not 8-bits
>> 
>> Really? In what instruction? I hadn't found it in the manual.
>> 
>>> * logical shift already zeroes top bits, so `vpand` not required?
>> 
>> Only every 2nd byte is shifted, the rest needs to be masked.
>>>   
>>>   * odd columns not shifted, so still have extra bits that need clearing
>> 
>> Yes, that is what the vpand does. (actually, it also (unnecessarily) masks 
>> the shifted bytes.
>> 
>>> * Why VBMI?
>>>   
>>>   * needed for `evpermb`
>> 
>> Yes.
>
>> > "Insert 0b0000 nibble after every third nibble". I only have two 
>> > questions, looks good otherwise.
> 
> @ferakocz apologies for the misunderstanding; everything after the PS was not 
> a request for change.. those were the questions that occurred to me and I 
> found the answer.. The only reason I put them in was for the next reviewer. 
> Or if I am wrong, e.g. no, I did not find a better instruction than vpermb 
> either. (My first reaction to seeing the java code, was 'oh, this is easy, 
> just a `vpermb`, then had to reason out why not..)

@vpaprotsk I've reran related regression tests and benchmarks after 
implementing your code review comments and remerging with the master branch.  
These have all came back with the expected results.  Could you reapprove after 
the merge commit?  Thank you.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28815#issuecomment-3730835813

Reply via email to