On Fri, Nov 16, 2012 at 02:21:14AM -0800, Colin Percival wrote: > To be honest, I didn't spend a huge amount of time optimizing this code...
That's fine. Your code is far more optimal and cleaner than most other code out there, and we can optimize it further now. :-) Unfortunately, some optimizations make it less readable (although some others make it more readable), but that's why you also have a reference implementation. So I think we're OK making the -sse source file slightly less readable. > On 11/15/12 20:50, Solar Designer wrote: > > I think having X as a local variable lets the compiler fully keep it in > > registers, whereas having it passed into the function by reference may > > result in unnecessary writes into the provided X array before the > > function returns; it may also encourage the compiler to do such writes > > inside the loop, especially since its iteration count is determined by r > > and thus is not known at compile time (might be low). > > Makes sense. I ended up replacing the X array with two pointers, X and Y, which point to Bin and Bout array elements. This avoids having to save a copy of X. (In your original code this was done with a blkcpy(). In my older revision of the code, it was an extra assignment in salsa20_8_xor(). Now neither is needed.) Alexander
