Tim Peters wrote: > ... > Someone want a finite project that would _really_ help their Uncle > Timmy in his slow-motion crusade to get Python on the list of "solved > it!" languages for each problem on that magnificent site? ... > Turns out it's _not_ input speed that's the problem here, and not even > mainly the speed of integer mod: the bulk of the time is spent in > int(string)....
OK, I got an idea about how to do this fast. I started with Python code, and I now have C code that should beat int(string) always while getting a lot of speed making long values. The working tables can be used to do the reverse transformation (int or long to string in some base) with a little finesse, but I haven't done that yet in C. The code is pretty sprawly now (a lot left in for instrumentation and testing pieces), but can eventually get smaller. I gave myself time to do this as a birthday present to myself. It may take a while to build a patch, but perhaps you can let me know how much speedup you get using this code. if you build this module, I'd suggest using "from to_int import chomp" to get a function that works like int (producing a long when needed and so on). > If you can even track all the levels of C function calls that ends up > invoking <wink>, you find yourself in PyOS_strtoul(), which is a > nifty all-purpose routine that accepts inputs in bases 2 thru 36, can > auto-detect base, and does platform-independent overflow checking at > the cost of a division per digit. All those are features, but it > makes for sloooow conversion. OK, this code doesn't deal with unicode at all. The key observations are: A) to figure out the base, you pretty much need to get to the first digit; getting to the first non-zero digit is not that much worse. B) If you know the length of a string of digits (starting at the first non-zero digit) and the base, you know approximately how bits the result will have. You can do a single allocation if you are building a long. You can tell if you need to test for overflow in building an int; there is one length per base where you must. So the question becomes, is it worth taking two passes at the digits? Well, it sure looks like it to me, but I haven't timed one or two- character integers. I do longs in "megadigits" -- the largest set of digits that fits safely in SHIFT bits, so they have no need for overflow checks. For further excitement, you can use a similar technique to go from the number of bits to the string length. That should make for a fast convert int/long to string in any of 36 (or more, really) bases. I pass all of your mentioned test cases (including the one from a later message). I'm pretty much out of time for this project at the moment, but encouraging words would help me steal some time to finish. For anyone wanting to look at the code, or try it themselves: Installer: http://members.dsl-only.net/~daniels/dist/to_int-0.10.win32-py2.4.exe Just the 2.4 dll: http://members.dsl-only.net/~daniels/dist/to_int-0.10.win32.zip Sources: http://members.dsl-only.net/~daniels/dist/to_int-0.10.zip --Scott David Daniels [EMAIL PROTECTED] _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com