... > ----------------------------- > Speeding up list append calls > ----------------------------- > > A `comp.lang.python message from Tim Peters`_ prompted Neal Norwitz > to investigate how the code that Tim posted could be sped up. He > hacked the code to replace var.append() with the LIST_APPEND opcode, > ....
Someone want a finite project that would _really_ help their Uncle Timmy in his slow-motion crusade to get Python on the list of "solved it!" languages for each problem on that magnificent site? http://spoj.sphere.pl It turns out that many of the problems there have input encoded as vast quantities of integers (stdin is a mass of decimal integers on one or more lines). Most infamous for Python is this tutorial (you don't get points for solving it) problem, which is _trying_ to test whether your language of choice can read from stdin "fast enough": http://spoj.sphere.pl/problems/INTEST/ """ The input begins with two positive integers n k (n, k<=10**7). The next n lines of input contain one positive integer t_i, not greater than 10**9, each. Output Write a single integer to output, denoting how many integers t_i are divisable by k. Example Input: 7 3 1 51 966369 7 9 999996 11 Output: 4 """ There's an 8-second time limit, and I believe stdin is about 8MB (you're never allowed to see the actual input they use). They have a slower machine than you use ;-), so it's harder than it sounds. To date, 975 people have submitted a program that passed, but only a few managed to do it using Python. I did, and it required every trick in the book, including using psyco. Turns out it's _not_ input speed that's the problem here, and not even mainly the speed of integer mod: the bulk of the time is spent in int(string) (and, yes, that's also far more important to the problem Neal was looking at than list.append time). If you can even track all the levels of C function calls that ends up invoking <wink>, you find yourself in PyOS_strtoul(), which is a nifty all-purpose routine that accepts inputs in bases 2 thru 36, can auto-detect base, and does platform-independent overflow checking at the cost of a division per digit. All those are features, but it makes for sloooow conversion. I assume it's the overflow-checking that's the major time sink, and it's not correct anyway: it does the check slightly differently for base 10 than for any other base, explained only in the checkin comment for rev 2.13, 8 years ago: For base 10, cast unsigned long to long before testing overflow. This prevents 4294967296 from being an acceptable way to spell zero! So what are the odds that base 10 was the _only_ base that had a "bad input" case for the overflow-check method used? If you thought "slim", you were right ;-) Here are other bad cases, under all Python versions to date (on a 32-bit box; if sizeof(long) == 8, there are different bad cases): int('102002022201221111211', 3) = 0 int('32244002423141', 5) = 0 int('1550104015504', 6) = 0 int('211301422354', 7) = 0 int('12068657454', 9) = 0 int('1904440554', 11) = 0 int('9ba461594', 12) = 0 int('535a79889', 13) = 0 int('2ca5b7464', 14) = 0 int('1a20dcd81', 15) = 0 int('a7ffda91', 17) = 0 int('704he7g4', 18) = 0 int('4f5aff66', 19) = 0 int('3723ai4g', 20) = 0 int('281d55i4', 21) = 0 int('1fj8b184', 22) = 0 int('1606k7ic', 23) = 0 int('mb994ag', 24) = 0 int('hek2mgl', 25) = 0 int('dnchbnm', 26) = 0 int('b28jpdm', 27) = 0 int('8pfgih4', 28) = 0 int('76beigg', 29) = 0 int('5qmcpqg', 30) = 0 int('4q0jto4', 31) = 0 int('3aokq94', 33) = 0 int('2qhxjli', 34) = 0 int('2br45qb', 35) = 0 int('1z141z4', 36) = 0 IOW, the only bases that _aren't_ "bad" are powers of 2, and 10 because it's special-cased (BTW, I'm not sure that base 10 doesn't have a different bad case now, but don't care enough to prove it one way or the other). Now fixing that is easy: the problem comes from being too clever, doing both a multiply and an addition before checking for overflow. Check each operation on its own and it would be bulletproof, without special-casing. But that might be even slower (it would remove the branch special-casing 10, but add a cheap integer addition overflow check with its own branch). The challenge (should you decide to accept it <wink>) is to replace the overflow-checking with something both correct _and_ much faster than doing n integer divisions for an n-character input. For example, 36**6 < 2**32-1, so whenever the input has no more than 6 digits overflow is impossible regardless of base and regardless of platform. That's simple and exploitable. For extra credit, make int(string) go faster than preparing your taxes ;-) BTW, Python as-is can be used to solve many (I'd bet most) of these problems in the time limit imposed, although it may take some effort, and it may not be possible without using psyco. A Python triumph I'm particularly fond of: http://spoj.sphere.pl/problems/FAMILY/ The legend at the bottom: Warning: large Input/Output data, be careful with certain languages seems to be a euphemism for "don't even think about using Python" <0.9 wink>. But there's a big difference in this one: it's a _hard_ problem, requiring graph analysis, delicate computation, greater than double-precision precision (in the end), and can hugely benefit from preprocessing a batch of queries to plan out and minimize the number of operations needed. Five people have solved it to date (click on "Best Solutions"), and you'll see that my Python entry is the second-fastest so far, beating 3 C++ entries by 3 excellent C++ programmers. I don't know what they did, but I suspect I was far more willing to code up an effective but tedious "plan out and minimize" phase _because_ I was using Python. I sure didn't beat them on reading the mass quantities of integers from stdin <wink>. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com