> If the goal is replace md5sum, then one thing to think about is which digest > will have the widest reach for everyone? Can all four versions be > implemented in (mostly?) portable C code? Is performance the only real > difference? Suppose we took just blake2s?
All four are available in mostly-portable C code, as well as in various optimized versions: https://blake2.net/#dl and https://blake2.net/#sw . The differences are: 1. The b's are more efficient on 64-bit architectures, the s's are more efficient on 32-bit architectures. Search for "blake2b" and "blake2s" in http://bench.cr.yp.to/results-hash.html . For example on an Intel x86-64 Xeon E3-1275 V3 (http://bench.cr.yp.to/results-hash.html#amd64-titan0), blake2b costs 3.09 cpb and blake2s costs 5.35 cpb. On the other hand on an NVIDIA ARM Tegra 250 (http://bench.cr.yp.to/results-hash.html#armeabi-h2tegra), blake2b costs 37.43 cpb and blake2s costs 13.49 cpb. (I looked at the worst-case quartile for 4096-byte inputs for those measurements.) 2. The b's can emit up to 512 bits of output, the s's can emit up to 256 bits of output. 3. The 'p' versions use more cores and finish faster. Interestingly, on my 64-bit, 4-CPU Intel Core i5 system (a Google Chromebook Pixel 1) blake2sp is slightly faster than blake2bp. This might be because with hyperthreading I have effectively 8 (?) efficient threads. blake2sp is 8-way while blake2bp is 4-way. Or maybe it is for some other reason. Regards, Zooko _______________________________________________ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
