Rémi Lapeyre <remi.lape...@henki.fr> added the comment:

Hi everybody, I opened PR 11781 to add a key argument to functions in the bisect
module. I agree with @dmtr's points that this addition is not a bad design.

As far as I can tell, the key function is at called at most once per item as 
this
example where an assertion would break shows:


    import bisect
    from collections import defaultdict


    class Test:
        def __init__(self, value):
            self.value = value


    cache = defaultdict(int)

    def key(e):
        cache[e] += 1
        assert cache[e] <= 1
        return e.value


    l = [Test(i) for i in range(10000)]

    bisect.bisect(l, Test(25), key=key)


    ➜  cpython git:(add-key-argument-to-bisect) ./python.exe
    Python 3.8.0a1+ (heads/add-key-argument-to-bisect:b7aaa1adad, Feb  7 2019, 
17:33:24) 
    [Clang 10.0.0 (clang-1000.10.44.4)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import bisect
    >>> from collections import defaultdict
    >>> 
    >>> 
    >>> class Test:
    ...     def __init__(self, value):
    ...         self.value = value
    ... 
    >>> 
    >>> cache = defaultdict(int)
    >>> 
    >>> def key(e):
    ...     cache[e] += 1
    ...     assert cache[e] <= 1
    ...     return e.value
    ... 
    >>> 
    >>> l = [Test(i) for i in range(10000)]
    >>> 
    >>> bisect.bisect(l, Test(25), key=key)
    26


This argument can be used where the objects are immutable and I have not been 
able
to see changes in bisect speed using Victor Stinner perf module:

(cpython-venv) ➜  cpython git:(add-key-argument-to-bisect) ✗ make distclean && 
./configure --with-pydebug --with-openssl=$(brew --prefix openssl) && make -sj 
&& python -m perf timeit --rigorous -s "from bisect import bisect" 
"bisect(range(1_000_000_000_000_000), 25)" -o $(git rev-parse --short HEAD).json
(cpython-venv) ➜  cpython git:(add-key-argument-to-bisect) ✗ git checkout 
cd90f6a369
(cpython-venv) ➜  cpython git:(cd90f6a369) ✗ make distclean && ./configure 
--with-pydebug --with-openssl=$(brew --prefix openssl) && make -sj && python -m 
perf timeit --rigorous -s "from bisect import bisect" 
"bisect(range(1_000_000_000_000_000), 25)" -o $(git rev-parse --short HEAD).json
(cpython-venv) ➜  cpython git:(cd90f6a369) ✗ python -m perf compare_to 
cd90f6a369.json b7aaa1adad.json
Mean +- std dev: [cd90f6a369] 36.2 us +- 1.0 us -> [b7aaa1adad] 35.7 us +- 0.5 
us: 1.01x faster (-1%)

(cd90f6a369 was somtime faster than b7aaa1adad, sometime slower but they always
were less than one std dev from one another)

As I wrote in the discussion of the PR, I suspect the branch predictor to 
predict
reliably the branching in the hot path (though I don't know much about that and
would love some input).


For the record, here is the performance when a key function is given:

(cpython-venv) ➜  cpython git:(add-key-argument-to-bisect) ✗ python -m perf 
timeit --rigorous -s "from bisect import bisect" 
"bisect(range(1_000_000_000_000_000), 25, key=lambda e: e)"

.........................................
Mean +- std dev: 59.3 us +- 1.0 us

It seems to me that adding the key parameter is the best solution possible.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue4356>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to