Mark Dickinson <[EMAIL PROTECTED]> added the comment: Just to be clear: this patch halves the number of shifts and masks, asymptotically; it doesn't affect the number of adds and multiplies (except for saving a few additions to zero by setting the initial carry intelligently). Is that correct? It's quite surprising that the bit operations have such a large effect on the running time.
Perhaps this is an argument for considering changing PyLong_SHIFT to 16 (or better, 32, since surely almost every compiler provides uint64_t or equivalent these days)? Though that would be quite an involved task. While I believe the idea is sound, the patch isn't quite correct---it's got some assertions that won't always hold. For example, with the patch, in a debug build of Python, I get: >>> a = b = 2**30-1 [36413 refs] >>> a*b Assertion failed: (carry <= PyLong_MASK), function x_mul, file Objects/longobject.c, line 2228. Abort trap I'm fairly sure that the assert(carry <= PyLong_MASK) doesn't matter, and that the assertion at the end of the main outer loop (assert(carry >> PyLong_SHIFT) == 0)) should still hold. But some more comments and analysis in the code would be good. For example, if carry >= PyLong_MASK the the comment that 2*MASK + 2*MASK*MASK is contained in a twodigit isn't so useful. How big can carry get? ---------- nosy: +marketdickinson _______________________________________ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3944> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com