Tim Peters <t...@python.org> added the comment:
[Mark] > I ran some timings for comb(k, 67) on my macOS / Intel MacBook Pro, > using timeit to time calls to a function that looked like this: > > def f(comb): > for k in range(68): > for _ in range(256): > comb(k, 67) > comb(k, 67) > ... # 64 repetitions of comb(k, 67) in all I'm assuming you meant to write comb(67, k) instead, since the comb(k, 67) given is 0 at all tested k values except for k=67, and almost never executes any of the code in question. It's surprising to me that even the long-winded popcount code was faster! The other way needs to read up 3 1-byte values from a trailing zero table, but the long-winded popcount emulation needs to read up 4 4-byte mask constants (or are they embedded in the instruction stream?), in addition to doing many more bit-fiddling operations (4 shifts, 4 "&" masks, 3 add/subtract, and a multiply - compared to just 2 add/subtract). So if the results are right, Intel timings make no sense to me at all ;-) ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue37295> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com