[issue45530] Improve listobject.c's unsafe_tuple_compare()

Stefan Pochmann Thu, 21 Oct 2021 16:19:33 -0700


Stefan Pochmann <stefan.pochm...@gmail.com> added the comment:


I see you mentioned that PyObject_RichCompareBool(..., Py_EQ) might be faster 
for example because it checks identity. Why not do an identity check before the 
ms->tuple_elem_compare calls? Too expensive and rarely successful?
 
> Extending the idea to positions beyond the first is dubious on those grounds: 
> if the first tuple positions in fact often match, then the optimization has 
> already backfired. Time to admit defeat then, not double down on failed 
> arrogance ;-)
 
I don't see that. First and second positions are quite different.

For example I sorted a million tuples where both first and second element are 
randomly chosen from a population of 10,000. So their amounts of duplication 
were the same. But these are the statistics from sorting:
- First position:   18,603,981 equality comparisons, 29.87% equal
- Second position:   5,556,203 equality comparisons,  0.09% equal
Many first positions matched, almost no second positions matched.

One more idea: What do you think of sorting lists of tuples (or tuple keys) in 
two stages?
1) Sort the list only by first position (comparing with only one 
tuple_elem_compare).
2) Identify equal-first-position-runs (with tuple_elem_compare) and sort each 
run independently (comparing only positions 1+).
Some experiments I did with this looked good, not sure if too off-topic to post 
here...

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue45530>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue45530] Improve listobject.c's unsafe_tuple_compare()

Reply via email to