Patches item #936813, was opened at 2004-04-17 10:16 Message generated for change (Settings changed) made by birkenfeld You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=936813&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.5 Status: Open Resolution: None Priority: 5 Submitted By: Trevor Perrin (trevp) >Assigned to: Tim Peters (tim_one) Summary: fast modular exponentiation Initial Comment: For crypto-sized numbers, Python mod-exp is several times slower than GMP or OpenSSL (6x or more). Those libraries do crazy special-case stuff, + assembly, platform-specific tuning, and so on. However, there's some low-hanging fruit: this patch has a few basic optimizations giving a 2-3x speedup for numbers in the 1000-8000 bit range (that's what I've mostly tested; but the patch should improve, or at least not hurt, everything else): - x_mul() is special-cased for squaring, which is almost twice as fast as general multiplication. - x_mul() uses pointers instead of indices for iteration, giving ~10% speedup (under VC6). - the right-to-left square-and-multiply exponentiation algorithm is replaced with a left-to-right square-and-multiply, which takes advantage of small bases. - when the exponent is above a certain size, "k-ary" exponentiation is used to reduce the number of multiplications via precalculation. - when the modulus is odd, Montgomery reduction is used. - the Karatsuba cutoff seems too low. For multiplicands in the range of 500-5000 bits, Karatsuba slows multiplication by around ~25% (VC6sp4, Intel P4M 1.7 Ghz). For larger numbers, the benefits of Karatsuba are less than they could be. Currently, the cutoff is 35 digits (525 bits). I've tried 70, 140, 280, and 560. 70, 140, and 280 are roughly the same: they don't slow down small values, and they have good speedup on large ones. 560 is not quite as good for large values, but at least it doesn't hurt small ones. I know this is platform-dependent, but I think we should err on the side of making the cutoff too high and losing some optimization, instead of putting it too low and slowing things down. I suggest 70. A couple misc. things: - Negative exponents with a modulus no longer give an error, when the base is coprime with the modulus. Instead, it calculates the multiplicative inverse of the base with respect to the modulus, using the extended euclidean algorithm, and exponentiates that. Libraries like GMP and LibTomMath work the same way. Being able to take inverses mod a number is useful for cryptography (e.g. RSA, DSA, and Elgamal). - The diff includes patch 923643, which supports converting longs to byte-strings. Ignore the last few diff entries, if you don't want this. - I haven't looked into harmonizing with int_pow(). Something may have to be done. ---------------------------------------------------------------------- Comment By: Trevor Perrin (trevp) Date: 2005-09-29 08:29 Message: Logged In: YES user_id=973611 I updated this patch to CVS head, but didn't change it otherwise. It's still a bit hairy. However, it's also still a big speedup (see benchmarks from 2004-10-03). If I can do anything to help this make it in 2.5, let me know. ---------------------------------------------------------------------- Comment By: Trevor Perrin (trevp) Date: 2004-10-05 09:25 Message: Logged In: YES user_id=973611 Montgomery has a fixed cost, so it slows down small exponents. For example modular squaring is slowed ~5x. I added a MONTGOMERY_CUTOFF to take care of this. Submitting long_mont4.diff. ---------------------------------------------------------------------- Comment By: Trevor Perrin (trevp) Date: 2004-10-04 09:48 Message: Logged In: YES user_id=973611 oops. Good thing for random testing, carry propagation was buggy. Submitting long_mont3.diff. ---------------------------------------------------------------------- Comment By: Trevor Perrin (trevp) Date: 2004-10-04 07:43 Message: Logged In: YES user_id=973611 I did more code review, testing, and timing. The only change in this new patch (long_mont2.diff) is a couple "int"s were changed to "digits"s, and it's against CVS head. As far as testing, I used the random module and GMPY to check it on ~3 million random input values. That's about an hour of testing. I'll leave the tests running for a few days and see if anything crops up. As far as timing, I updated the benchmarks with a new machine (OpenBSD): http://trevp.net/long_pow/ On 3 different machines, Montgomery gives a speedup of 2x, 3x, and 4x. That dwarfs what we've done so far, so I'm crossing my fingers for 2.4. Let me know if I can explain or improve the code, or anything.. (The below crypto library comes with a "book" which has an explanation of Montgomery I found helpful): http://math.libtomcrypt.org/download.html ---------------------------------------------------------------------- Comment By: Trevor Perrin (trevp) Date: 2004-09-13 10:20 Message: Logged In: YES user_id=973611 Here's the 3rd part of the patch (long_mont.diff; Montgomery Reduction), diff'd against 2.4a3 and cleaned up a bit. Note that this doesn't include negative exponent handling. If this patch is accepted, I'll make a new tracker item for that, since it's not an optimization, just an "opportunistic feature" (it builds on one of the helper functions needed for Montgomery). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2004-08-30 04:47 Message: Logged In: YES user_id=31435 Same deal with the 2nd part of the patch (major format changes, minor code changes). Incidentally fixed an old leak bug in long_pow() during the review. Added code to raise a compile-time error (C) if SHIFT isn't divisible by 5, and removed long_pow's new hardcoded assumption that SHIFT is exactly 15. Include/longintrepr.h 2.16 Misc/NEWS 1.1120 Objects/longobject.c 1.163 This is cool stuff (& thank you!), but I'm sorry to say I can't foresee making time for the 3rd part of the patch for weeks. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2004-08-30 00:21 Message: Logged In: YES user_id=31435 Checked in the first part of the patch, with major format changes (Python's C coding standard is hard 8-column tabs), and minor code changes: Include/longintrepr.h 2.15 Misc/ACKS 1.280 Misc/NEWS 1.1119 Objects/longobject.c 1.162 I don't know whether it's possible for me to get to part 2 of the patch before 2.4a3, but I would like to. It seems plainly impossible that I'll be able to get to part 3 before 2.4a3. ---------------------------------------------------------------------- Comment By: Trevor Perrin (trevp) Date: 2004-07-22 10:39 Message: Logged In: YES user_id=973611 Pragmatics isn't my strong suit... but I get your drift :-). I split it into 3 diffs: 1) x_mul optimizations: (pointers instead of indices, special-case squaring, changing Karatsuba cutoff) 2) rewriting long_pow() for left-to-right 5-ary 3) Montgomery reduction. This also includes l_invmod(), since it's necessary for Montgomery. I've left out the code which exposes l_invmod() to the user (and associated docs, tests, and intobject changes). We could slap that on afterwards or not... Anyways, these are applied sequentially: longobject.c + longobject1.diff = longobject1.c longobject1.c + longobject2.diff = longobject2.c longobject2.c + longobject2.diff = longobject3.c Should I open new tracker items for them? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2004-07-21 21:29 Message: Logged In: YES user_id=31435 Pragmatics are a real problem here, Trevor. I don't foresee being able to make a solid block of sufficient hours to give to reviewing this before Python 2.4 is history (which is why I've left this patch unassigned, BTW -- I just can't promise to make enough time). So if nobody else can volunteer to review it, that alone is likely to leave the patch sitting here unapplied. But there are several independent changes in this patch, and it *could* be broken into several smaller patches. I tossed that bait out before, but you didn't bite. You should <wink>. ---------------------------------------------------------------------- Comment By: Trevor Perrin (trevp) Date: 2004-07-19 13:00 Message: Logged In: YES user_id=973611 Tim, thanks for the feedback. I'm uploading a new patch against CVS latest that fixes those issues, and adds docs and tests. Also, I cleaned up the code quite a bit, and got it properly handling (I hope) all the varied combinations of ints/longs, positives/negatives/zeros etc.. Unfortunately, Montgomery is the bulk of the speedup: http://trevp.net/long_pow/ But I could split out the negative exponent handling into a separate patch, if that would be easier. Anyways, I'd like to add more tests for the exponentiation stuff. Aside from that, I think the patch is complete. And more robust than previously, though I still wouldn't trust it until another person or two gives it a serious looking-over.... ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2004-07-17 05:06 Message: Logged In: YES user_id=31435 Notes after a brief eyeball scan: Note that the expression a & 1 == 1 groups as a & (1 == 1) in C -- comparisons have higher precedence in C than bit- fiddling operators. Stuff like that is usually best resolved by explicitly parenthesizing any "impure" expression fiddling with bits. In this case, in a boolean expression plain a & 1 has the hoped-for effect. and is clearer anyway. Would be better to use "**" than "^" in comments when exponentiation is intended, since "^" means xor in both Python and C. Doc changes are needed, because you're changing visible semantics in some cases. Tests are needed, especially for new semantics. l_invmod can return NULL for more than one reason, but one of its callers ignores this, assuming that all NULL returns are due to lack of coprimality. It's unreasonable to, e.g., replace a MemoryError with a complaint about coprimality; this needs reworking. l_invmod should probably set an exception in the "not coprime" case. As is, it's a weird function, sometimes setting an exception when it returns NULL, but not setting one when coprimality doesn't obtain. That makes life difficult for callers (which isn't apparent in the patch, because its callers are currently ignoring this issue). The Montgomery reduction gimmicks grossly complicate this patch -- they're long-winded and hard to follow. That may come with the territory, but it's the only part of the patch that made me want to vomit <wink>. I'd be happier if it weren't there, for aesthetic, clarity, and maintainability reasons. How much of a speedup does it actually buy? You're right that int pow must deliver the same results as long pow, so code is needed for that too. "short int" versus "unbounded int" is increasingly meant to be an invisible internal implementation detail in Python. I'm also in favor of giving this meaning to modular negative exponents, btw, so no problem with that. An easy way would be to change int pow to delegate to long pow when this is needed. Pragmatics: there's a better chance of making 2.4 if the patch were done in bite-size stages. For example, no doc changes are needed to switch to 5-ary left-to-right exponentation, and that has no effect on the int implementation either, etc. A patch that did just that much probably would have gone in a long time ago. ---------------------------------------------------------------------- Comment By: Trevor Perrin (trevp) Date: 2004-07-13 10:04 Message: Logged In: YES user_id=973611 Uploading 2nd version of longobject.diff - the only change is that patch 923643 is removed from this diff. That was a diff for converting longs to byte-strings, which I unnecessarily left in. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=936813&group_id=5470 _______________________________________________ Patches mailing list [email protected] http://mail.python.org/mailman/listinfo/patches
