Tim Peters <t...@python.org> added the comment:

[Mark]
> I ran some timings for comb(k, 67) on my macOS / Intel MacBook Pro,
> using timeit to time calls to a function that looked like this:
>
> def f(comb):
>     for k in range(68):
>         for _ in range(256):
>             comb(k, 67)
>             comb(k, 67)
>            ... # 64 repetitions of comb(k, 67) in all

I'm assuming you meant to write comb(67, k) instead, since the comb(k, 67) 
given is 0 at all tested k values except for k=67, and almost never executes 
any of the code in question.

It's surprising to me that even the long-winded popcount code was faster! The 
other way needs to read up 3 1-byte values from a trailing zero table, but the 
long-winded popcount emulation needs to read up 4 4-byte mask constants (or are 
they embedded in the instruction stream?), in addition to doing many more 
bit-fiddling operations (4 shifts, 4 "&" masks, 3 add/subtract, and a multiply 
- compared to just 2 add/subtract).

So if the results are right, Intel timings make no sense to me at all ;-)

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue37295>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to