[issue23259] Remove dummy reuse optimization from sets
Serhiy Storchaka added the comment: If you want to decrease the number of executed instructions, I suggest to duplicate lookup functions (actually add boolean flag and switch one of two branches). On read lookup function should ignore dummy entries, on modification it should behave as now. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23259 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23259] Remove dummy reuse optimization from sets
Serhiy Storchaka added the comment: In some cases the slowdown is over 100x. $ ./python -m timeit -s s = set(range(10**4)) -- s.discard(0); s.add(0) Unpatched: 100 loops, best of 3: 1.68 usec per loop Patched: 1 loops, best of 3: 183 usec per loop -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23259 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23259] Remove dummy reuse optimization from sets
Raymond Hettinger added the comment: I'll just close this one. There seems to be little interest in it. Originally, I characterized freeslot tracking as an optimization that mildly benefited an uncommon case (adds/deletes/moreadds) at the expense of the common cases (uniquification, membership testing, and set-to-set operations). As Antoine pointed-out, on some systems the cost of a predictable branch routinely not take can be almost zero. There was still benefit in code simplification, reducing the size of the stack frame, freeing a register that needed to saved and restored when called PyRich_CompareBool(), etc. But no one seems to care about those. As Serhiy found, there is one case where the freeslot tracking benefit wasn't mild. In the unusual but possible case of deleting and reinserting the exact same value in a large set, dummy reuse prevents a catastrophic linear pile-up. And so it goes. Free slot tracking lives. -- resolution: - rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23259 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23259] Remove dummy reuse optimization from sets
Changes by Raymond Hettinger raymond.hettin...@gmail.com: -- Removed message: http://bugs.python.org/msg234210 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23259 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23259] Remove dummy reuse optimization from sets
Serhiy Storchaka added the comment: This can be a common case in following algorithm (mesh optimization): while elems: elem = elems.pop() changed = optimize(elem) if changed: elems.update(neighbors(elem)) -- nosy: +serhiy.storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23259 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23259] Remove dummy reuse optimization from sets
New submission from Raymond Hettinger: The lookkey routines in Object/setobject.c have logic to track the first open freeslot in a search. The benefit is that new keys can reuse previously deleted slots. The benefit only occurs in cases where keys are added, then some removed, and then more added with no intervening resizes (which clear all dummy entries) and only when the newly added keys happen to land on previously deleted entries. The cost of the optimization is the extra logic in the inner search loop, an extra freeslot stackframe field that needs to be saved and restored, and extra logic on the loop exit. It also causes dummies to leak out of the lookkey routines so that the code in contains(), discard(), and insert() all have to check for the dummy case. This patch removes the freeslot tracking. It saves one field on the stackframe, simplifies the lookkey inner-loop logic, simplifies the lookkey found_null logic, it confines knowledge of dummies to just the lookkey functions, and it simplifies the code in the insert(), contains(), and discard() functions. In short, it looks like the freeslot idea was a net negative -- it optimized an uncommon case at the cost of slowing and complicating the common cases. -- assignee: rhettinger components: Interpreter Core files: late8.diff keywords: patch messages: 234198 nosy: rhettinger priority: low severity: normal stage: patch review status: open title: Remove dummy reuse optimization from sets type: performance versions: Python 3.5 Added file: http://bugs.python.org/file37750/late8.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23259 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23259] Remove dummy reuse optimization from sets
Raymond Hettinger added the comment: Even in the mesh algorithm, we let resizing periodically clean-up the dummies. The idea is to not pay the freeslot tracking cost on every lookup and instead only clean-up periodically (which would likely give better performance for the mesh algorithm as well, since making a single pass clean-up during resizing is cheaper than doing multi-step tracking for every insertion). The slots do get reused, just not immediately. Also, the idea is to not let the possibility of pop-change-update algorithms create a cost for the more common uses of sets (uniquification, fast membership testing, and set-to-set operations such as union, intersection, and difference). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23259 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23259] Remove dummy reuse optimization from sets
Antoine Pitrou added the comment: In short, it looks like the freeslot idea was a net negative -- it optimized an uncommon case at the cost of slowing and complicating the common cases. Do you have a benchmark showing the slowing down? -- nosy: +pitrou ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23259 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23259] Remove dummy reuse optimization from sets
Raymond Hettinger added the comment: I've observed the generated assembly has fewer instructions on the critical path and that a register was freed-up. That's enough for me in this case (it's too easy to get trapped in local minimums in timing small changes like this). Do either of you have a technical comment on the patch? If it isn't broken or seriously misguided, I will likely apply it shortly. -- nosy: +tim.peters ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23259 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23259] Remove dummy reuse optimization from sets
Antoine Pitrou added the comment: Fewer instructions doesn't necessarily translate into better performance. The bottleneck in superscalar, pipelined processors is often the data dependency path between instructions. Adding instructions may as well not slow anything down, if those instructions don't lengthen the dependency path. It would be interesting to know what the impact of the patch is on the workloads that were supposed to be helped by the removed logic. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23259 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com