On 8/31/2017 2:40 PM, Manciu, Catalin Gabriel wrote:
Hi everyone,

While looking over the PyLong source code in Objects/longobject.c I came
across the fact that the PyLong object doesnt't include implementation for
basic inplace operations such as adding or multiplication:

[...]
     long_long,                  /*nb_int*/
     0,                          /*nb_reserved*/
     long_float,                 /*nb_float*/
     0,                          /* nb_inplace_add */
     0,                          /* nb_inplace_subtract */
     0,                          /* nb_inplace_multiply */
     0,                          /* nb_inplace_remainder */
[...]

While I understand that the immutable nature of this type of object justifies
this approach, I wanted to experiment and see how much performance an inplace
add would bring.
My inplace add will revert to calling the default long_add function when:
        - the refcount of the first operand indicates that it's being shared
        or
        - that operand is one of the preallocated 'small ints'
which should mitigate the effects of not conforming to the PyLong immutability
specification.
It also allocates a new PyLong _only_ in case of a potential overflow.

The workload I used to evaluate this is a simple script that does a lot of
inplace adding:

        import time
        import sys

        def write_progress(prev_percentage, value, limit):
                percentage = (100 * value) // limit
                if percentage != prev_percentage:
                        sys.stdout.write("%d%%\r" % (percentage))
                        sys.stdout.flush()
                return percentage

        progress = -1
        the_value = 0
        the_increment = ((1 << 30) - 1)
        crt_iter = 0
        total_iters = 10 ** 9

        start = time.time()

        while crt_iter < total_iters:
                the_value += the_increment
                crt_iter += 1
                
                progress = write_progress(progress, crt_iter, total_iters)
        end = time.time()

        print ("\n%.3fs" % (end - start))
        print ("the_value: %d" % (the_value))

Running the baseline version outputs:
./python inplace.py
100%
356.633s
the_value: 1073741823000000000

Running the modified version outputs:
./python inplace.py
100%
308.606s
the_value: 1073741823000000000

In summary, I got a +13.47% improvement for the modified version.
The CPython revision I'm using is 7f066844a79ea201a28b9555baf4bceded90484f
from the master branch and I'm running on a I7 6700K CPU with Turbo-Boost
disabled (frequency is pinned at 4GHz).

Do you think that such an optimization would be a good approach ?

On my machine, the more realistic code, with an implicit C loop,
the_value = sum(the_increment for i in range(total_iters))
gives the same value twice as fast as your explicit Python loop.
(I cut total_iters down to 10**7).

You might check whether sum uses an in-place accumulator for ints.

--
Terry Jan Reedy

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to