Patches item #1479611, was opened at 2006-04-30 23:58 Message generated for change (Comment added) made by nnorwitz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1479611&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.5 Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: speed up function calls Initial Comment: Results: 2.86% for 1 arg (len), 11.8% for 2 args (min), and 1.6% for pybench. trunk-speed$ ./python.exe -m timeit 'for x in xrange(10000): len([])' 100 loops, best of 3: 4.74 msec per loop trunk-speed$ ./python.exe -m timeit 'for x in xrange(10000): min(1,2)' 100 loops, best of 3: 8.03 msec per loop trunk-clean$ ./python.exe -m timeit 'for x in xrange(10000): len([])' 100 loops, best of 3: 4.88 msec per loop trunk-clean$ ./python.exe -m timeit 'for x in xrange(10000): min(1,2)' 100 loops, best of 3: 9.09 msec per loop pybench goes from 5688.00 down to 5598.00 Details about the patch: There are 2 unrelated changes. They both seem to provide equal benefits for calling varargs C. One is very simple and just inlines calling a varargs C function rather than calling PyCFunction_Call() which does extra checks that are already known. This moves meth and self up one block. and breaks the C_TRACE into 2. (When looking at the patch, this will make sense I hope.) The other change is more dangerous. It modifies load_args() to hold on to tuples so they aren't allocated and deallocated. The initialization is done one time in the new func _PyEval_Init(). It allocates 64 tuples of size 8 that are never deallocated. The idea is that there won't be usually be more than 64 frames with 8 or less parameters active on the stack at any one time (stack depth). There are cases where this can degenerate, but for the most part, it should only be marginally slower, but generally this should be a fair amount faster by skipping the alloc and dealloc and some extra work. My decrementing the _last_index inside the needs_free blocks, that could improve behaviour. This really needs comments added to the code. But I'm not gonna get there tonight. I'd be interested in comments about the code. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2006-05-05 01:27 Message: Logged In: YES user_id=33168 v2 attached. You might not want to review yet. I mostly did the first part of your suggest (stats, _Fini, and stack-like if I understood you correctly). I didn't do anything on the second part about inlinting Function_Call. perf seems to be about the same. I'm not entirely sure the patch is correct yet. I found one or two problems in the original. I added some more comments. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2006-05-01 01:27 Message: Logged In: YES user_id=21627 The tuples should get deallocated when Py_Finalize is called. It would be good if there was (conditional) statistical analysis, showing how often no tuple was found because the number of arguments was too large, and how often no tuple was found because the candidate was in use. I think it should be more stack-like, starting off with no tuples allocated, then returning them inside the needs_free blocks only if the refcount is 1 (or 2?). This would avoid degeneralized cases where some function holds onto its argument tuple indefinitely, thus consuming all 64 tuples. For the other part, I think it would make the code more readable if it inlined PyCFunction_Call even more: the test for NOARGS|O could be integrated into the switch statement (one case for each), VARARGS and VARARGS|KEYWORDS would both load the arguments, then call the function directly (possibly with NULL keywords). OLDARGS should goto either METH_NOARGS, METH_O, or METH_VARARGS depending on na (if you don't like goto, modifying flags would work as well). ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-05-01 00:08 Message: Logged In: YES user_id=33168 I should note the numbers 64 and 8 are total guesses. It might be good to try and determine values based on empirical data. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1479611&group_id=5470 _______________________________________________ Patches mailing list [email protected] http://mail.python.org/mailman/listinfo/patches
