On Fri, Feb 5, 2021 at 11:07 AM Lasse Collin <lasse.col...@tukaani.org> wrote:
>
> On 2021-02-02 Brett Okken wrote:
> > Thus far I have only tested on jdk 11 64bit windows, but the fairly
> > clear winner is:
> >
> >     public void update(byte[] buf, int off, int len) {
> >         final int end = off + len;
> >         int i=off;
> >         if (len > 3) {
> >             switch (i & 3) {
> >                 case 3:
> >                     crc = TABLE[0][(buf[i++] ^ (int) crc) & 0xFF] ^
> >                           (crc >>> 8);
> >                 case 2:
> >                     crc = TABLE[0][(buf[i++] ^ (int) crc) & 0xFF] ^
> >                           (crc >>> 8);
> >                 case 1:
> >                     crc = TABLE[0][(buf[i++] ^ (int) crc) & 0xFF] ^
> >                           (crc >>> 8);
> >             }
>
> To ensure (i & 3) == 0 when entering the main loop, the case-labels
> should be 1-2-3, not 3-2-1. This may have messed up your tests. :-(

Egads. I clearly just copied/pasted and did not think about it enough.
The results were still correct, but definitely only got aligned if the
offset was divisible by 2.


> With a very quick test I didn't see much difference if I changed the
> case-label order.

I re-ran with this fixed and the changes are not significant.

>
> On 2021-02-02 Brett Okken wrote:
> > I tested jdk 15 64bit and jdk 11 32bit, client and server and the
> > above implementation is consistently quite good.
> > The alternate in running does not do the leading alignment. This
> > version is really close in 64 bit testing and slightly faster for 32
> > bit. The differences are pretty small, and both are noticeably better
> > than my original proposal (and all 3 are significantly faster than
> > current). I think I would lead towards the simplicity of not doing the
> > leading alignment, but I do not have a strong opinion.
>
> Let's go with the simpler option.
>
> >         switch (len & 3) {
> >                 case 3:
> >                     crc = TABLE[0][(buf[i++] ^ (int) crc) & 0xFF] ^
> >                           (crc >>> 8);

Sounds good.

> I suppose this should use the same (faster) array indexing style as the
> main loop:
>
>     crc = TABLE[0][(buf[off++] & 0xFF) ^ ((int)crc & 0xFF)]
>           ^ (crc >>> 8);

Yes.

> Also, does it really help to unroll the loop? With 8191-byte buffers I
> see no significant difference (in a quick not-very-accurate test) if
> the switch-statement is replaced with a while-loop.

The differences are pretty minimal. My observation was switch a bit
faster than for loop, which was a bit faster than a while loop. But
the differences in averages were less than the confidence interval for
the given tests.

> With these two changes the code becomes functionally identical to the
> version I posted with the name "Modified slicing-by-4". Is that an OK
> version to commit?

Yes.

> Is the following fine to you as the file header? Your email address can
> be omitted if you prefer that. I will mention in the commit message
> that you adapted the code from XZ Utils and benchmarked it.
>
> /*
>  * CRC64
>  *
>  * Authors: Brett Okken <EMAIL>
>  *          Lasse Collin <EMAIL>
>  *
>  * This file has been put into the public domain.
>  * You can do whatever you want with this file.
>  */

That is fine. You can include my e-mail.

Brett

Reply via email to