I tested jdk 15 64bit and jdk 11 32bit, client and server and the
above implementation is consistently quite good.
The alternate in running does not do the leading alignment. This
version is really close in 64 bit testing and slightly faster for 32
bit. The differences are pretty small, and both are noticeably better
than my original proposal (and all 3 are significantly faster than
current). I think I would lead towards the simplicity of not doing the
leading alignment, but I do not have a strong opinion.
public void update(byte[] buf, int off, int len)
{
final int end = off + len;
int i=off;
for (int j = end - 3; i < j; i += 4) {
final int tmp = (int)crc;
crc = TABLE[3][(tmp & 0xFF) ^ (buf[i] & 0xFF)] ^
TABLE[2][((tmp >>> 8) & 0xFF) ^ (buf[i + 1] & 0XFF)] ^
(crc >>> 32) ^
TABLE[1][((tmp >>> 16) & 0xFF) ^ (buf[i + 2] & 0XFF)] ^
TABLE[0][((tmp >>> 24) & 0xFF) ^ (buf[i + 3] & 0XFF)];
}
switch (len & 3) {
case 3:
crc = TABLE[0][(buf[i++] ^ (int) crc) & 0xFF] ^ (crc >>> 8);
case 2:
crc = TABLE[0][(buf[i++] ^ (int) crc) & 0xFF] ^ (crc >>> 8);
case 1:
crc = TABLE[0][(buf[i++] ^ (int) crc) & 0xFF] ^ (crc >>> 8);
}
}