>On my machine, the more realistic code, with an implicit C loop, >the_value = sum(the_increment for i in range(total_iters)) >gives the same value twice as fast as your explicit Python loop. >(I cut total_iters down to 10**7).
Your code is faster due to a number of reasons: - range in Python 3 is implemented in C so it's quite faster and, because your range only goes up to 10 ** 7, the fastest iterator is used: rangeiterobject for which the 'next' function is implemented using native longs instead of CPython PyLongs: rangeiter_next(rangeiterobject *r) from rangeobject.c - my code also does some extra work to output a progress indicator >You might check whether sum uses an in-place accumulator for ints. - you're right, sum actually works with native longs until it overflows or you stop adding PyLongs, then it falls back to PyNumber_Add, check: static PyObject * builtin_sum_impl(PyObject *module, PyObject *iterable, PyObject *start) from bltinmodule.c The focus of this experiment was inplace adds in general. While, as you've shown, there are ways to write the loop optimally, the benchmark was written as a huge loop just to showcase that there is an improvement using this approach. The performance improvement is a result of not having to allocate/deallocate a PyLong per iteration. A huge Python program with lots of PyLong inplace operations (not just adds, this can be applied to all PyLong inplace operations), regardless of them being in a loop or not, might benefit from such an optimization. Thank you, Catalin _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com