Eli the Bearded wrote: > I recently saw a link to an old post on a blog and then started looking > at the newer posts. This one: > > https://leancrew.com/all-this/2019/11/the-key-to-sorting-in-python/ > > discusses ways to deal with useful sorting of movie / television show > titles. Some initial words should be re-ordered for sorting purposes > (_Adventures of Huckleberry Finn, The_), roman numbers should sort like > regular numbers (_Rocky V_ comes before _Rocky IX_), and something needs > to be done about sorting accented vowels. > > But what caught my eye most, as someone relatively new to Python but > with long experience in C in Perl, is sorting doesn't take a > *comparison* function, it takes a *key generator* function, and that > function is supposed to transform the thing into something that the > native comparison knows how to compare. > > This seems a strange choice, and I'm wondering if someone can explain > the benefits of doing it that way to me.
Python 2 started with a comparison function and then grew a key function. With a key function you still have to compare items, you are just breaking the comparison into two steps: - Calculate the key. This can be slow as it has to be done once per item. - Compare the items. The key is usually a string or tuple, and for those the comparison runs C code which is a bit faster. The gain is even greater as the comparison is performed much more often. The disadvantages of the key-based approach are that some comparison functions cannot naturally be rewritten in this way and that memory usage is a bit higher. Both are typically "expert" problems, so the developers decided to "nudge" users to consider keys first. Now take a look at this example written in Python 2: import random import time import contextlib @contextlib.contextmanager def measure(message): start = time.time() yield span = time.time() - start print message.format(span) with open("/usr/share/dict/words") as f: words = f.read().splitlines() random.shuffle(words) def key_lower(a): global key_calls key_calls += 1 return a.lower() def cmp_lower(a, b): global cmp_calls cmp_calls += 1 return cmp(a.lower(), b.lower()) key_calls = 0 cmp_calls = 0 with measure("cmp-based sort took {} secs"): words[:].sort(cmp_lower) with measure("key-bases sort took {} secs"): words[:].sort(key=key_lower) print "cmp called", cmp_calls, "times" print "key called", key_calls, "times" $ python2 sort_demo.py cmp-based sort took 1.50221800804 secs key-bases sort took 0.293763160706 secs cmp called 1516111 times key called 99171 times -- https://mail.python.org/mailman/listinfo/python-list