New submission from Michał Górny <mgo...@gentoo.org>: The setup.py file for Python states:
if (not cross_compiling and os.uname().machine == "x86_64" and sys.maxsize > 2**32): # Every x86_64 machine has at least SSE2. Check for sys.maxsize # in case that kernel is 64-bit but userspace is 32-bit. blake2_macros.append(('BLAKE2_USE_SSE', '1')) While the assertion about having SSE2 is true, it doesn't mean that it's worthwhile to use. I've tested pure (i.e. without SSSE3 and so on) on three different machines, getting the following results: Athlon64 X2 (SSE2 is the best supported variant), 540 MiB of data: SSE2: [5.189988004000043, 5.070812243997352] ref: [2.0161159170020255, 2.0475422790041193] Core i3, same data file: SSE2: [1.924425926999902, 1.92461746999993, 1.9298037500000191] ref: [1.7940209749999667, 1.7900855569999976, 1.7835538760000418] Xeon E5630 server, 230 MiB data file: SSE2: [0.7671358410007088, 0.7797677099879365, 0.7648976119962754] ref: [0.5784736709902063, 0.5717909929953748, 0.5717219939979259] So in all the tested cases, pure SSE2 implementation is *slower* than the reference implementation. SSSE3 and other variants are faster and AFAIU they are enabled automatically based on CFLAGS, so it doesn't matter for most of the systems. However, for old CPUs that do not support SSSE3, the choice of SSE2 makes the algorithm prohibitively slow -- it's 2.5 times slower than the reference implementation! ---------- components: Extension Modules messages: 304696 nosy: mgorny priority: normal severity: normal status: open title: BLAKE2: the (pure) SSE2 impl forced on x86_64 is slower than reference type: performance versions: Python 3.6 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue31834> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com