> If the goal is replace md5sum, then one thing to think about is which digest 
> will have the widest reach for everyone?  Can all four versions be 
> implemented in (mostly?) portable C code?  Is performance the only real 
> difference?  Suppose we took just blake2s?

All four are available in mostly-portable C code, as well as in
various optimized versions: https://blake2.net/#dl and
https://blake2.net/#sw .

The differences are:

1. The b's are more efficient on 64-bit architectures, the s's are
more efficient on 32-bit architectures. Search for "blake2b" and
"blake2s" in http://bench.cr.yp.to/results-hash.html .

   For example on an Intel x86-64 Xeon E3-1275 V3
(http://bench.cr.yp.to/results-hash.html#amd64-titan0), blake2b costs
3.09 cpb and blake2s costs 5.35 cpb.

   On the other hand on an NVIDIA ARM Tegra 250
(http://bench.cr.yp.to/results-hash.html#armeabi-h2tegra), blake2b
costs 37.43 cpb and blake2s costs 13.49 cpb.

   (I looked at the worst-case quartile for 4096-byte inputs for those
measurements.)

2. The b's can emit up to 512 bits of output, the s's can emit up to
256 bits of output.

3. The 'p' versions use more cores and finish faster.

Interestingly, on my 64-bit, 4-CPU Intel Core i5 system (a Google
Chromebook Pixel 1) blake2sp is slightly faster than blake2bp. This
might be because with hyperthreading I have effectively 8 (?)
efficient threads. blake2sp is 8-way while blake2bp is 4-way. Or maybe
it is for some other reason.

Regards,

Zooko
_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev

Reply via email to