On Fri, Feb 5, 2021 at 11:07 AM Lasse Collin <lasse.col...@tukaani.org> wrote: > > On 2021-02-02 Brett Okken wrote: > > Thus far I have only tested on jdk 11 64bit windows, but the fairly > > clear winner is: > > > > public void update(byte[] buf, int off, int len) { > > final int end = off + len; > > int i=off; > > if (len > 3) { > > switch (i & 3) { > > case 3: > > crc = TABLE[0][(buf[i++] ^ (int) crc) & 0xFF] ^ > > (crc >>> 8); > > case 2: > > crc = TABLE[0][(buf[i++] ^ (int) crc) & 0xFF] ^ > > (crc >>> 8); > > case 1: > > crc = TABLE[0][(buf[i++] ^ (int) crc) & 0xFF] ^ > > (crc >>> 8); > > } > > To ensure (i & 3) == 0 when entering the main loop, the case-labels > should be 1-2-3, not 3-2-1. This may have messed up your tests. :-(
Egads. I clearly just copied/pasted and did not think about it enough. The results were still correct, but definitely only got aligned if the offset was divisible by 2. > With a very quick test I didn't see much difference if I changed the > case-label order. I re-ran with this fixed and the changes are not significant. > > On 2021-02-02 Brett Okken wrote: > > I tested jdk 15 64bit and jdk 11 32bit, client and server and the > > above implementation is consistently quite good. > > The alternate in running does not do the leading alignment. This > > version is really close in 64 bit testing and slightly faster for 32 > > bit. The differences are pretty small, and both are noticeably better > > than my original proposal (and all 3 are significantly faster than > > current). I think I would lead towards the simplicity of not doing the > > leading alignment, but I do not have a strong opinion. > > Let's go with the simpler option. > > > switch (len & 3) { > > case 3: > > crc = TABLE[0][(buf[i++] ^ (int) crc) & 0xFF] ^ > > (crc >>> 8); Sounds good. > I suppose this should use the same (faster) array indexing style as the > main loop: > > crc = TABLE[0][(buf[off++] & 0xFF) ^ ((int)crc & 0xFF)] > ^ (crc >>> 8); Yes. > Also, does it really help to unroll the loop? With 8191-byte buffers I > see no significant difference (in a quick not-very-accurate test) if > the switch-statement is replaced with a while-loop. The differences are pretty minimal. My observation was switch a bit faster than for loop, which was a bit faster than a while loop. But the differences in averages were less than the confidence interval for the given tests. > With these two changes the code becomes functionally identical to the > version I posted with the name "Modified slicing-by-4". Is that an OK > version to commit? Yes. > Is the following fine to you as the file header? Your email address can > be omitted if you prefer that. I will mention in the commit message > that you adapted the code from XZ Utils and benchmarked it. > > /* > * CRC64 > * > * Authors: Brett Okken <EMAIL> > * Lasse Collin <EMAIL> > * > * This file has been put into the public domain. > * You can do whatever you want with this file. > */ That is fine. You can include my e-mail. Brett