Re: Q: sort's key and cmp parameters
Bearophile bearophileh...@lycos.com writes: sorting, and something that's surely not bug-prone. In such situation having a 'key' argument is *better*. Such sort can be stable. Nothing stops comparison sorting from being stable. Since the rest of your post seems premised on the opposite, I hope that clears things up for you. -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
On Tue, 06 Oct 2009 23:20:13 -0700, Paul Rubin wrote: Bearophile bearophileh...@lycos.com writes: sorting, and something that's surely not bug-prone. In such situation having a 'key' argument is *better*. Such sort can be stable. Nothing stops comparison sorting from being stable. Since the rest of your post seems premised on the opposite, I hope that clears things up for you. I'm sure Paul already knows this, but key-based sorts are comparison sorts. There are two basic types of sorts: comparison based, where the routine has to compare items (usually with the operator), and non-comparison sorts, like bucket sort, pigeon-hole sort and radix sort. These sorts require special knowledge of the items being sorted, and don't need to compare two items. General purpose sorts like Python's sort() do, regardless of whether you pass a key function, a three-way comparison function, or something else. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Paul Rubin: Bearophile: sorting, and something that's surely not bug-prone. In such situation having a 'key' argument is *better*. Such sort can be stable. Nothing stops comparison sorting from being stable. Since the rest of your post seems premised on the opposite, I hope that clears things up for you. When I have written that post I was partially unfocused, I am sorry. What I meant is that a general sorting routine, even in D, is better to be first of all flexible. So I think it's better for the D built-in sort to be stable, because such extra invariant allows you to use the sort in more situations. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Bearophile bearophileh...@lycos.com writes: What I meant is that a general sorting routine, even in D, is better to be first of all flexible. So I think it's better for the D built-in sort to be stable, because such extra invariant allows you to use the sort in more situations. Note that stable sort has additional memory requirements. In situations where you don't need stability, but do need memory-efficient in-place sorting, an unstable sort might well be preferred. This is why libraries such as C++'s STL offer both. -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Hrvoje Niksic: Note that stable sort has additional memory requirements. In situations where you don't need stability, but do need memory-efficient in-place sorting, an unstable sort might well be preferred. This is why libraries such as C++'s STL offer both. There are stable sorts that need no additional memory. In a largish program written in a general-purpose multi-level language, like D (this is probably true for C++ too), often 80% of the code takes a very little time to run. Such code may need to process few small arrays, or small data sets, or to manage the GUI, etc. So optimizing those parts of the code for performance (both in memory used and CPU cycles used) is stupid. What you want in such large parts of the code is: - to write code quickly - to have code that's short, readable, easy to fix, and most important of all that is the less bug-prone as possible. So what you need in such large parts of the code is something that's very flexible and safe, even if it's not top performance (both in memory and CPU). This is why for example in D there are built-in associative arrays, that have a handy syntax, built-in methods, and they are never O(n^2), even if they are not the most efficient ones where you need max performance, or where you need minimal memory used, or where you need to perform unusual operations. Such built-ins are useful to reduce both code length and bug count, because they are quite safe and easy to use. Stable sorts are a little safer, because they don't scramble your data in certain ways, so they can be used in more situations. This is why having the Timsort in Python is good. When you run profile the D code and you see your program uses too much RAM or wastes too much time in the built-in sort, then you can switch to using special sorts from the std lib. You can even write your own hash or sort for special situations (and you don't have to drop down to use another language for that, you keep using the same, even if you may want to change your programming stile, and use a more C-like style, with no dynamic allocations, some bit twidding, pointers, more structs, 16 bit integers, unsigned integers, unions, compiler intrinsics, even inlined assembly that uses SSE3, etc). In normal code you may want to adopt a more Java-like programming style, or templates, or even using a large library I have written, you may program it almost as a curious Python-like, with lazyness too. This is why I think the built-in sort has to be flexible, because you are supposed to use it most of the times, and most of the times you don't need top performance. Different parts of a program have to be optimized for different things, computer performance, or programmer performance. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
On Tuesday 06 October 2009 02:40:46 pm Paul Rubin wrote: The problem is that if you allow to use the cmp, lot of programmers will use it straight away, not even bothering to know what that strange 'key' argument may be useful for. And they miss the how much handy 'key' is. Given how often we hear consenting adults as justification for any number of gaps in Python error checking, the argument above is singularly unpersuasive. Well, as long as you consider them gaps in need of a justification, of course that argument (and the one about the gaps themselves) will of course seem singularly unpersuasive. But if you see them as a feature (that may sometimes, albeit rarely, missfire), then you would have no problem with /either/ argument. -- Luis Zarrabeitia (aka Kyrie) Fac. de Matemática y Computación, UH. http://profesores.matcom.uh.cu/~kyrie -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
[Hrvoje Niksic] Note that stable sort has additional memory requirements. In situations where you don't need stability, but do need memory-efficient in-place sorting, an unstable sort might well be preferred. This is why libraries such as C++'s STL offer both. FWIW, the additional memory requirements are typically a set of pointers to the objects being sorted, so the memory overhead is typically very small relative to the size of the objects being sorted. IOW, this isn't much of a consideration in most Python apps. Raymond -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Raymond Hettinger: Psychologically, the thing that I find to be interesting is that beginners and intermediate users seem to take to key functions more readily than old timers. The key function seems to be an easy thing to teach (perhaps because that's the way Excel sorts and the way SQL sorts, or perhaps it is because problem statements seem to arise in the form of sort by this, then by that instead of here's how two objects should be compared). In contrast, some people who have have had deep experience with cmp functions may tend to first think of cmp solutions and then have a harder time seeing solutions with key functions. If you grew-up on C's qsort() like I did, then a key function may not be the first thing that pops into your head. I love the 'key', it makes my code simpler and it's simpler to understand. I am not opposed to the idea of keeping cmp, that in some rare cases may be useful or more natural. The problem is that if you allow to use the cmp, lot of programmers will use it straight away, not even bothering to know what that strange 'key' argument may be useful for. And they miss the how much handy 'key' is. I am having a very hard type (despite my implementation, explanations, etc) explaining to D programmers why for a flexible general-purpose function a key function argument is often better. They in the end have added something like that in the std lib, but with a weird name like SchwartzSort that's surely not the first standard sort they look for... this is bad. So a funny solution to this situation may be the following: to keep only 'key' in the sort/sorted for few years, so the community of Python programmers gets used to 'key' only, and then put 'cmp' back in, for the special cases ;-) Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Bearophile bearophileh...@lycos.com writes: The problem is that if you allow to use the cmp, lot of programmers will use it straight away, not even bothering to know what that strange 'key' argument may be useful for. And they miss the how much handy 'key' is. Given how often we hear consenting adults as justification for any number of gaps in Python error checking, the argument above is singularly unpersuasive. I am having a very hard type (despite my implementation, explanations, etc) explaining to D programmers why for a flexible general-purpose function a key function argument is often better. They in the end have added something like that in the std lib, but with a weird name like SchwartzSort that's surely not the first standard sort they look for... this is bad. key and the DSU (Schwartz) sorting pattern makes lots of sense in scripting languages like Python that are primarily used on relatively small data sets. The best reason for writing something in a lower level language like D is because you want to push the limits of your availiable machine resources. Therefore, the DSU strategy's extra memory consumption will be what you want less of the time than it is in Python. -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
[bearophile] I love the 'key', it makes my code simpler and it's simpler to understand. I am not opposed to the idea of keeping cmp, that in some rare cases may be useful or more natural. The problem is that if you allow to use the cmp, lot of programmers will use it straight away, not even bothering to know what that strange 'key' argument may be useful for. And they miss the how much handy 'key' is. ... So a funny solution to this situation may be the following: to keep only 'key' in the sort/sorted for few years, so the community of Python programmers gets used to 'key' only, and then put 'cmp' back in, for the special cases ;-) FWIW, sorting is only a small part of the picture. The notion of three-way cmp functions was removed throughout the language. The built-in cmp() function and the __cmp__ magic method and the corresponding C slot were all excised. None of them played nicely with rich comparisons. It was a case where having two-ways-to-do-it was complicating everyone's lives. AFAICT, very few people understood exactly how the two interacted -- the under-the-hood C code for it was complex, unattractive, and hard to understand. I am having a very hard type (despite my implementation, explanations, etc) explaining to D programmers why for a flexible general-purpose function a key function argument is often better. Last week, I learned a new term, Code Prion, that referred to a code technique such as cmp-functions which cause the proteins of the brain to become contagiously misfolded so that the victim 1) always prefers that technique, 2) easily infects others, and 3) is resistant to sterilization (using other techniques) ;-) Once Kernighan and Ritchie put C's qsort() into the food supply, we were doomed. FWIW, I think the standard library's unittest module is also a code prion -- there's no other explanation for its thriving despite competition from the vastly superior syntax of py.test or nose. Raymond -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Raymond Hettinger pyt...@rcn.com writes: Once Kernighan and Ritchie put C's qsort() into the food supply, we were doomed. It was already too late. Knuth vol 3 came out in 1973(?) and its sorting half is mostly about comparison sorting. FWIW, I think the standard library's unittest module is also a code prion -- there's no other explanation for its thriving despite competition from the vastly superior syntax of py.test or nose. unittest is sort of a clone of JUnit if I understand correctly. py.test and nose are obscure because they're not in the stdlib. I have used unittest a little bit but I mostly use doctest these days. I have the impression that py.test and/or nose are better in some ways but I haven't bothered checking further. -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Paul Rubin: bearophile: I am having a very hard type (despite my implementation, explanations, etc) explaining to D programmers why for a flexible general-purpose function a key function argument is often better. They in the end have added something like that in the std lib, but with a weird name like SchwartzSort that's surely not the first standard sort they look for... this is bad. key and the DSU (Schwartz) sorting pattern makes lots of sense in scripting languages like Python that are primarily used on relatively small data sets. The best reason for writing something in a lower level language like D is because you want to push the limits of your availiable machine resources. Therefore, the DSU strategy's extra memory consumption will be what you want less of the time than it is in Python. I think you are quite wrong. In D dynamic arrays are built-in, they are used almost as Python lists (but they are single-typed, unless you create an array of variants), among other things they have a built-in sort. Such sort is used for small situations too. Even in a language like D *very* often what you need is to sort a small amount of data in a very flexible way. In such situations what you need isn't to squeeze the last byte or CPU cycle out of the PC, but to have a handy and short syntax, a very flexible sorting, and something that's surely not bug-prone. In such situation having a 'key' argument is *better*. Such sort can be stable. That's why the main sort/sorted routines in my dlibs accept an optional key callable, and they are very handy. I can show you examples. On the other hand once in a while in a D program you may need to sort a very large amount of data, or you may need something faster (and unstable, like a refined introsort based on a 2-pivot QS), or even more special than the things allowed by the built in sort. In such situation you can import a special sorting function from the std lib and use it, such sort can have something equivalent to a cmp (but lower level, it can accept a function template alias, to inline the comparison operation). So the idea is: in a multi-level language you very often have to sort a limited amount of data in complex ways. Because in most programs a quite large percentage of the code isn't performance-critical. In such large parts of the code what you want is something very handy, safe and flexible. So having a key function is positive. In the few spots where you need high performance, you can import (or even implement with your hands) and use a specialized sorting routine. D is a general purpose language, it's not used like C in Python projects to write performance-critical spots only. This is also visible in other parts of the language, for example there are built-in associative arrays. Their syntax is handy, and even if they aren't high performance (they are sometimes slower than Python dicts despite being O(1), but they never have a O(n^2) performance profile, as may happen to Python dicts, so they are safer) you can use them in a very quick way, even if you have to create a 10-items associative array. The end result is that in D programs you can find more hashes than in C++ code, where I think programmers use them only where simpler solutions (like iterating on an array) aren't good enough. (So D AAs have to be fast even in situations where you put 8 items inside them, like in Python dicts). This free usage of AAs makes the code stronger too (because once in a while you may put 1000 items in such AA, and it will keeps working efficiently, while a linear scan in a list of 1000 items starts to become not efficient). Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
On Tue, 06 Oct 2009 12:28:00 -0700, Raymond Hettinger wrote: [bearophile] I love the 'key', it makes my code simpler and it's simpler to understand. I am not opposed to the idea of keeping cmp, that in some rare cases may be useful or more natural. The problem is that if you allow to use the cmp, lot of programmers will use it straight away, not even bothering to know what that strange 'key' argument may be useful for. And they miss the how much handy 'key' is. ... So a funny solution to this situation may be the following: to keep only 'key' in the sort/sorted for few years, so the community of Python programmers gets used to 'key' only, and then put 'cmp' back in, for the special cases ;-) FWIW, sorting is only a small part of the picture. The notion of three-way cmp functions was removed throughout the language. The built-in cmp() function and the __cmp__ magic method and the corresponding C slot were all excised. None of them played nicely with rich comparisons. It was a case where having two-ways-to-do-it was complicating everyone's lives. AFAICT, very few people understood exactly how the two interacted -- the under-the-hood C code for it was complex, unattractive, and hard to understand. There's no reason why a hypothetical comparison function for sort() needs to be a three-way comparison function. All you need to implement sorting is a function to implement less-than -- I believe that sort() and min() are already implemented in such a way as to only require the objects implements __lt__. Hypothetically, sort(), min() and heapq.nsmallest() could take an optional less-than comparison function, and max() and heapq.nlargest() could use a greater-than comparison function. Of course you can do this now with a cost of O(N) and a helper class: class SortHelper: def __init__(self, x): self.x = x def __lt__(self, other): return self.x other.x + 42 # or whatever... # Decorate, Sort, Undecorate. mylist = map(SortHelper, mylist) mylist.sort() mylist = [obj.x for obj in mylist] If your list is huge, you can do the decoration in place: for i, x in enumerate(mylist): mylist[i] = SortHelper(x) mylist.sort() for i, obj in enumerate(mylist): mylist[i] = obj.x but since this is going to especially useful for really big lists, paying that extra O(N) cost twice will hurt. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Raymond Hettinger pyt...@rcn.com writes: So, it looks like the relevant comparison values can be stored in nested lists: list_of_lists = \ [[1, [3, [], []], [7, [5, [], []], []]], [19, [23, [], []], []], ] Now you've copied all the data out of the original tree, which is even worse than putting a class wrapper around it. The comparison remains (asymptotically) as expensive as before, too. Are there any of these trees that cannot be transformed (by a key function) into an equivalent nested list of lists and values so that the trees can be directly compared to one another using Python's normal built-in recursive comparison order for lists? I think so. Say you want to change the numeric comparisons so that even numbers always sort before odd numbers, ie. -4 -2 0 2 4 ... -999 ... -1 1 3 ... I don't think there is a way to re-encode the numbers so they can be compared with direct integer comparison. Even if there is, I think there are reasonable orderings on the trees themselves that can't possibly be embedded in the standard python comparison order. I might be able to construct an example using ordinal diagrams from proof theory. -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
So, it looks like the relevant comparison values can be stored in nested lists: list_of_lists = \ [[1, [3, [], []], [7, [5, [], []], []]], [19, [23, [], []], []], ] Now you've copied all the data out of the original tree, which is even worse than putting a class wrapper around it. The comparison remains (asymptotically) as expensive as before, too. FWIW, you could also just flatten it to: [(1,3,7,5), (19,23)]. The point is that the tree comparisons you presented have a predetermined order of values to compare, so they either be recast a flattened list of comparison values or the tree itself can be cast in a list of lists form. Either way, the O(n log n) step of doing the actual comparisons runs much faster than if you coded a recursive cmp function written in pure python. Say you want to change the numeric comparisons so that even numbers always sort before odd numbers, ie. -4 -2 0 2 4 ... -999 ... -1 1 3 ... This is too easy: s = [-2, 4, 2, -1, -3, 1, -4, 0, 3] s.sort() s.sort(key=lambda x: x%2) s [-4, -2, 0, 2, 4, -3, -1, 1, 3] The use cases you've presented so far aren't convincing, but I'm sure that if you keep making-up random, weird use cases, there will be one where a cmp function is a better approach. I think there are reasonable orderings on the trees themselves that can't possibly be embedded in the standard python comparison order. I might be able to construct an example using ordinal diagrams from proof theory. As long as the values of a tree get compared in a predetermined order, there will always be a flattened list equivalent that works faster using a key function. If you're going to concoct something isn't easily transformable to a key function, I think your best bet is to create a comparison where the trees have an interaction other than comparing values at identical node positions; otherwise, the tree shape hasn't made the problem any more difficult than sorting a list of successive values. The concocted comparison function needs to exploit some aspect of the tree shape than can't be precomputed in a key function (perhaps involving some complex interaction between the two compared items like a tree merge or somesuch). Instead of trying to create a hard comparison from first principles, there may already be some research on the subject in the SQL world. It seems to me that SQL's ORDER BY clauses are essentially just key functions, so maybe there are known cases where a query cannot be sorted with just the tools provided by SQL. Raymond P.S. I accept than you hate sorting in Py3.x. There's no need to convince me of that. I was just curious about your one real-world use case and whether I could find a straight-forward key function that would get the job done. -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Say you want to change the numeric comparisons so that even numbers always sort before odd numbers, ie. -4 -2 0 2 4 ... -999 ... -1 1 3 ... This is too easy: s = [-2, 4, 2, -1, -3, 1, -4, 0, 3] s.sort() s.sort(key=lambda x: x%2) s [-4, -2, 0, 2, 4, -3, -1, 1, 3] I don't think so: s = [0, 2, -2, 3, 1, -1, 4, -4, -3] s.sort(key=lambda x:x%2) s [0, 2, -2, 4, -4, 3, 1, -1, -3] There were two sorts in my post and only one in yours. That's why you didn't get the same answer. s.sort(key=lambda x:(x%2,x)) might work but why are you so opposed to being able to write one's natural thought pattern if that pattern happens to be a comparison function? Who wants to concoct these ad-hoc encodings for every ordering? Your (x%2, x) idea seems like a good way to do it in a single pass. Also, it seems to be a natural translation of the problem statement, even numbers always sort before odd numbers, into a primary key and secondary sort key. How about this: gen_e, gen_pi, gen_sqrt2, etc. all generate representations of various real numbers as possibly infinite sequences of digits: def gen_pi(): yield 3 yield 1 yield 4 # ... compute infinite stream one digit at a time Comparison is obvious if (hmmm) you can guarantee that you're sorting unique elements and that the sort function won't compare an element with itself (otherwise I guess you could cut off comparison at a million digits or whatever): def cmp(gen1, gen2): for d,e in izip(gen1(), gen2()): c = cmp(d,e) if c: return c I don't see any clean way to handle that with key=, though of course maybe I'm missing something. Right. I would be forced to use a Cmp2Key wrapper for this one. If sorting lazily evaluated infinite sequences is a common use case for you, then you're going to love Haskell ;-) I don't hate sorting in py3.x. That's good to hear. I understand the concept that py3.x introduces incompatibilities to make things better. What I don't like is the approach of breaking stuff gratutiously with no benefit. If the positional arg for cmp is a kludge, why not switch it to a keyword arg instead of eliminating it? When Guido made the final call, I'm sure he was balancing a number of goals including language simplification and one-way-to-do-it. I'm also sure that sorting lazily evaluated infinite sequences was not on his radar screen :-) key= is very useful but it is a derived operation, not the other way around. As you showed with infinite sequence example, it may well be the case that cmp is more general and can more readily handle an exotic case. Psychologically, the thing that I find to be interesting is that beginners and intermediate users seem to take to key functions more readily than old timers. The key function seems to be an easy thing to teach (perhaps because that's the way Excel sorts and the way SQL sorts, or perhaps it is because problem statements seem to arise in the form of sort by this, then by that instead of here's how two objects should be compared). In contrast, some people who have have had deep experience with cmp functions may tend to first think of cmp solutions and then have a harder time seeing solutions with key functions. If you grew-up on C's qsort() like I did, then a key function may not be the first thing that pops into your head. One other interesting data point: Python uses key functions in min(), max(), heapq.nsmallest(), heapq.nlargest, and itertools.groupby(). Those functions never supported a cmp function argument, nor has one ever been requested. Raymond -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Raymond Hettinger pyt...@rcn.com writes: There were two sorts in my post and only one in yours. That's why you didn't get the same answer. Whoops, missed that. When Guido made the final call, I'm sure he was balancing a number of goals including language simplification and one-way-to-do-it. I'm also sure that sorting lazily evaluated infinite sequences was not on his radar screen :-) I can't help wondering if there will be feature requests for a cmp argument for the rest of eternity until it eventually makes its way back in. Psychologically, the thing that I find to be interesting is that beginners and intermediate users seem to take to key functions more readily than old timers. I think key is preferable in at least 90% of the cases. I also think a general purpose language should be able to handle 100% of the cases, so if key is all you have, there can be a gap to fill. In contrast, some people who have have had deep experience with cmp functions may tend to first think of cmp solutions and then have a harder time seeing solutions with key functions. If you grew-up on C's qsort() like I did, then a key function may not be the first thing that pops into your head. That could be. One other interesting data point: Python uses key functions in min(), max(), heapq.nsmallest(), heapq.nlargest, and itertools.groupby(). Those functions never supported a cmp function argument, nor has one ever been requested. Interesting. Maybe someday... -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
On Mon, 05 Oct 2009 19:34:27 -0700, Paul Rubin wrote: Raymond Hettinger pyt...@rcn.com writes: When Guido made the final call, I'm sure he was balancing a number of goals including language simplification and one-way-to-do-it. I'm also sure that sorting lazily evaluated infinite sequences was not on his radar screen :-) I can't help wondering if there will be feature requests for a cmp argument for the rest of eternity until it eventually makes its way back in. When I learned that cmp was being dropped, I was horrified. That lasted for about fifteen minutes of time actually using key :) Occasionally I come across sorting problems where it isn't obvious how to write the key function, but I've never come across one where it wasn't possible at all. I no longer miss cmp. Psychologically, the thing that I find to be interesting is that beginners and intermediate users seem to take to key functions more readily than old timers. I think key is preferable in at least 90% of the cases. I also think a general purpose language should be able to handle 100% of the cases, so if key is all you have, there can be a gap to fill. I don't think it's possible to enumerate 100% of the cases, let alone support them. Here's one thing I've sometimes wanted to do: sort multiple lists at once, according to the contents of one of them, with a single call. You can get the same result with a key function and sorting each list separately, but if you've got a lot of large lists, that becomes expensive. Python's sort is a memory-sort -- you can't sort a list too big to fit into memory. Sorting 400GB of data using a single function call is not something I see as fundamental to a general purpose language (although of course you should be able to write a disk-sort utility using Python). Nor does sort() support the case where the value of the comparisons differs according to where the elements are in the list, e.g. given: [a, b, c, d, a, b] the result of comparing a and b at the start of the list is different from comparing them at the end of the list. I don't think there's an actual use-case for this feature, and cmp would have just as much trouble solving this (pretend) problem as key does, but I don't think languages need to supply this as a fundamental operation. If you need it, write your own sort function :) Which of course would be a solution to your problem. The source code for timsort is available, and stable. You could fork it, re-enable the support for comparison functions, and put it on the Cheeseshop. If there is sufficient interest, maybe you'll get cmp support re-added to the built-in in a version or two. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
[Paul Rubin] Example of list of trees (nested dicts). In practice you could get such a list from the simplejson module: list_of_trees = [{'value':1, 'left':{'value':3,'left':None,'right':None}, 'right':{'value':7,'left':{'value':5, ...}}}, {'value':19, 'left':{'value':23', ...}}, ... ] So, it looks like the relevant comparison values can be stored in nested lists: list_of_lists = \ [[1, [3, [], []], [7, [5, [], []], []]], [19, [23, [], []], []], ] Here I've used a transformation where a tree is stored as: [tree['value'], tree['left'], tree['right']] and the None value indicating an empty tree transformed to: [] Example comparison function: def compare(tree1, tree2): c = cmp(tree1['value'], tree2['value']) if c != 0: return c c = compare(tree1['left'], tree2['left']) if c != 0: return c return compare(tree1['right'], tree2['right]) Hmm, that recursive comparison order looks like how Python already recursively compares lists. So, perhaps the lists can be compared directly: if list_of_lists[0] list_of_lists[1]: ... Are the trees user defined classes? Not in general. They might be nested tuples, lists, or dictionaries. Or they could come from a non-extensible library-defined class, like from cElementTree Are there any of these trees that cannot be transformed (by a key function) into an equivalent nested list of lists and values so that the trees can be directly compared to one another using Python's normal built-in recursive comparison order for lists? Raymond -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
[Paul Rubin] The idea was that you have a list of trees that you want to sort, and an ordering relation between trees: def gt(tree1, tree2): ... where you recursively descend both trees until you find an unequal pair of nodes. You're not trying to sort the nodes within a single tree. Can you give an example of a list of trees and a cmp function that recursively compares them? Raymond -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
[Paul Rubin] The idea was that you have a list of trees that you want to sort, and an ordering relation between trees: def gt(tree1, tree2): ... Are the trees user defined classes? Can the gt() function be added incorporated as __lt__ method so that you can just run a plain sort: sort(list_of_trees) Is the recursive search order something you can turn into a straight sequence: sort(list_of_trees, key=flattener) IOW, if there is an ordering relation between the trees, why can't it be either part of the tree API or collapsable into a list of successive nodes to be compared. From the sound of it, the trees are static during the sort and would get a nice O(n log n) -- O(n) speed-up if a key function were allowed to flatten them in a single pass. Raymond -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Raymond Hettinger pyt...@rcn.com writes: Can you give an example of a list of trees and a cmp function that recursively compares them? Example of list of trees (nested dicts). In practice you could get such a list from the simplejson module: list_of_trees = [{'value':1, 'left':{'value':3,'left':None,'right':None}, 'right':{'value':7,'left':{'value':5, ...}}}, {'value':19, 'left':{'value':23', ...}}, ... ] Example comparison function: def compare(tree1, tree2): c = cmp(tree1['value'], tree2['value']) if c != 0: return c c = cmp(tree1['left'], tree2['left']) if c != 0: return c return cmp(tree1['right'], tree2['right]) Are the trees user defined classes? Not in general. They might be nested tuples, lists, or dictionaries. Or they could come from a non-extensible library-defined class, like from cElementTree. From the sound of it, the trees are static during the sort and would get a nice O(n log n) -- O(n) speed-up if a key function were allowed to flatten them in a single pass. But the key function has to do all those comparisons on the internal nodes. -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Paul Rubin http://phr...@nospam.invalid writes: Example comparison function: def compare(tree1, tree2): c = cmp(tree1['value'], tree2['value']) if c != 0: return c c = cmp(tree1['left'], tree2['left']) if c != 0: return c return cmp(tree1['right'], tree2['right]) Sorry, meant recursive comparison. def compare(tree1, tree2): c = cmp(tree1['value'], tree2['value']) if c != 0: return c c = compare(tree1['left'], tree2['left']) if c != 0: return c return compare(tree1['right'], tree2['right]) -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Paul Rubin http://phr...@nospam.invalid writes: c = compare(tree1['left'], tree2['left']) Of course this recursive call crashes if either branch is None. Oh well, I'll stop trying to correct it since I'm sure you get the idea. -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
In 7xtyyhikrl@ruckus.brouhaha.com Paul Rubin http://phr...@nospam.invalid writes: Python 2.x provides two ways and you can use whichever one fits the application better. I have never understood why Python 3.x finds it necessary to break one of them. Maybe I can migrate to Haskell by the time Python 2.x becomes deprecated. !!! Maybe Haskell is much handier than I give it credit for, but it's hard for me to imagine that it is as convenient as Python 3, even without the cmp sort option... (And I agree with you that getting rid of sort's cmp in Python 3 was a bad idea.) What's going on here? Our lab recently hired a new postdoc who, to our disbelief, works almost exclusively in OCaml. And I hear all this talk about switching to Haskell or Scheme. I don't get it. Despite the elegance of these languages, the libraries are not there. It seems to me it would take forever to get the simplest things done in these languages... Confused. kj -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
kj no.em...@please.post writes: !!! Maybe Haskell is much handier than I give it credit for, but it's hard for me to imagine that it is as convenient as Python 3, even without the cmp sort option... Heh, yeah, I was being a bit snarky/grouchy. Haskell has a very steep learning curve and will never be as convenient as Python for banging out some small script. It's worth considering for larger or more serious programs. What's going on here? Our lab recently hired a new postdoc who, to our disbelief, works almost exclusively in OCaml. And I hear all this talk about switching to Haskell or Scheme. I don't get it. Despite the elegance of these languages, the libraries are not there. It seems to me it would take forever to get the simplest things done in these languages... Haskell's library is growing very rapidly, more so than Python's I'd say. Take a look at http://donsbot.wordpress.com/2009/03/16/visualising-the-haskell-universe/ if you're willing to count Hackage (sort of the equivalent of the Python cheese shop). The Haskell Platform (counterpart to Python stdlib) is also very actively expanding. Ocaml and Scheme both seem to me to be sort of stagnant. Scheme is an elegant fossil. Some people find Ocaml to be at a sweet spot, combining Python's convenience and the more important aspects of Haskell's expressiveness. I haven't used it myself. It seems to me that Haskell is attracting all the most advanced development attention. -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Paul Rubin http://phr...@nospam.invalid wrote: Duncan Booth duncan.bo...@invalid.invalid writes: Is there a real-life sorting task that requires (or is far more efficient with) cmp and can't be easily achieved with key and reverse? There is no sorting task that *requires* cmp. If all else fails you can define a key class to wrap the original wrapper such that comparing the keys simply calls the comparison function that you *would* have used. I would count that as key being far less efficient, though still giving the same result. You'll notice I carefully didn't comment on the efficiency. However, without testing it I'm not convinced that it would be 'far less' efficient. Using cmp2key you have an overhead of one class per sort plus one instance for each element being sorted, and an additional one function call for each comparison. The only example given so far that would justify needing this (sorting trees which require recursive comparison) by its very nature would make both of these additional overheads insignificant as each element already contains multiple instances and each comparison contains multiple function calls. -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Paul Rubin wrote: I still have never understood why cmp was removed. Sure, key is more convenient a lot (or maybe most) of the time, but it's not always. Not just more convenient. cmp will always be N log N, in that _every_ comparison runs your function, while key is linear, in that it is run once per element. Most cases are moreeasily done with key, and it is a good idea to make the most accessible way to a sort be the most efficient one. In the rare case that you really want each comparison, the cmp-injection function will do nicely (and can be written as a recipe. In short, make the easy path the fast path, and more will use it; provide two ways, and the first that springs to mind is the one used. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Scott David Daniels scott.dani...@acm.org writes: Most cases are moreeasily done with key, and it is a good idea to make the most accessible way to a sort be the most efficient one. In the rare case that you really want each comparison, the cmp-injection function will do nicely (and can be written as a recipe. I don't think wrapping the sorted objects in an otherwise useless special purpose class is nicely, either from a performance or from a code verbosity point of view. I avoid Java and its useless extra classes for a reason ;-). In short, make the easy path the fast path, and more will use it; provide two ways, and the first that springs to mind is the one used. I think we are saying the same thing. Python 2.x provides two ways and you can use whichever one fits the application better. I have never understood why Python 3.x finds it necessary to break one of them. Maybe I can migrate to Haskell by the time Python 2.x becomes deprecated. -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
[Paul Rubin] Yes, think of sorting tree structures where you have to recursively compare them til you find an unequal pair of nodes. I'm not sure what you mean by this. What are the semantics of sorting a tree? Can you outline an example of tree that could be sorted easily with a cmp function but not a key function? t = [[9,6],[7,1]],[[2,5],[4,3]] # inputs t.sort(mycmp) # what would the cmp function be? print t # what would the output be Raymond -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Raymond Hettinger pyt...@rcn.com writes: I'm not sure what you mean by this. What are the semantics of sorting a tree? Can you outline an example of tree that could be sorted easily with a cmp function but not a key function? The idea was that you have a list of trees that you want to sort, and an ordering relation between trees: def gt(tree1, tree2): ... where you recursively descend both trees until you find an unequal pair of nodes. You're not trying to sort the nodes within a single tree. -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Is this a homework? Challenge: to come up with a sorting task that cannot be achieved by passing to the sort method (or sorted function) suitable values for its key and reverse parameters, but instead *require* giving a value to its cmp parameter. Let me put up this question: how do you define the difference between an object to be sorted in the list, and the passed suitable value that is used for finding the position for the object in the list. If the only difference is that they need to be different Python objects, then you can always use [x] instead of x. This value is perfectly suitable. ;-) If you could give us somewhat stronger requirements regarding keys and objects, we may give you a different answer. Best, Laszlo -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
kj wrote: Challenge: to come up with a sorting task that cannot be achieved by passing to the sort method (or sorted function) suitable values for its key and reverse parameters, but instead *require* giving a value to its cmp parameter. For example, from random import random scrambled = some_list.sort(cmp=lambda x, y: cmp(random(), 0.5)) The above violates the transitivity requirement as stated below. The result will have a bias. can be achieved (to a very good approximation at least) with scrambled = some_list.sort(key=lambda x: random()) Is there a real-life sorting task that requires (or is far more efficient with) cmp and can't be easily achieved with key and reverse? The core developers don't think there is one. list.sort() no longer supports the cmp parameter in 3.x. Many thanks in advance, G. P.S. I guess that, if sort is going to produce a non-trivial, *consistent* order, a function CMP passed as the value of its cmp parameter must satisfy the following properties: 0. CMP(x, y) must be non-zero for some elements x, y of the list; 1. anti-symmetry: sign(CMP(x, y)) must be equal to -sign(CMP(y, x)); 2. transitivity: if sign(CMP(x, y)) equals sign(CMP(y, z)), then sign(CMP(x, z)) must be equal to sign(CMP(x, y)). In (1) and (2), sign() stands for the function -1 if x 0 sign(x) = 0 if x == 0 1 if x 0 I suppose that all bets are off if these properties are not satisfied, though the documentation does not make this entirely clear, as far as I can tell. (If I'm wrong about this alleged omission, please point me to the part of the docs that I missed.) -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
can be achieved (to a very good approximation at least) with scrambled = some_list.sort(key=lambda x: random()) Is there a real-life sorting task that requires (or is far more efficient with) cmp and can't be easily achieved with key and reverse? The core developers don't think there is one. list.sort() no longer supports the cmp parameter in 3.x. LOL :-D BTW if you really want to sort a list then you can. Using keys or values, it doesn't matter. The op's question has no practical usage. REALLY looks like a homework. L -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
kj wrote: Challenge: to come up with a sorting task that cannot be achieved by passing to the sort method (or sorted function) suitable values for its key and reverse parameters, but instead *require* giving a value to its cmp parameter. Such a task can't exist, because any arbitrary comparison function can be transformed into a key function. The idea behind the transformation is to construct wrapper objects that can compare themselves to other wrapper objects by invoking the given comparison function on their wrapped originals. Google for Hettinger cmp2key if you want to see the code that does this transformation. HTH, -- Carsten Haese http://informixdb.sourceforge.net -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Laszlo Nagy wrote: can be achieved (to a very good approximation at least) with scrambled = some_list.sort(key=lambda x: random()) Is there a real-life sorting task that requires (or is far more efficient with) cmp and can't be easily achieved with key and reverse? The core developers don't think there is one. list.sort() no longer supports the cmp parameter in 3.x. LOL :-D BTW if you really want to sort a list then you can. Using keys or values, it doesn't matter. The op's question has no practical usage. REALLY looks like a homework. L If it is, it's one he's contemplating giving his students. :-D ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
kj no.em...@please.post writes: Is there a real-life sorting task that requires (or is far more efficient with) cmp and can't be easily achieved with key and reverse? Yes, think of sorting tree structures where you have to recursively compare them til you find an unequal pair of nodes. To sort with key=... you have to wrap a class instance around each tree, with special comparison methods. Blecch. -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Paul Rubin: Yes, think of sorting tree structures where you have to recursively compare them til you find an unequal pair of nodes. That's cute. In what situations do you have to perform such kind of sort? Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
kj no.em...@please.post wrote: Is there a real-life sorting task that requires (or is far more efficient with) cmp and can't be easily achieved with key and reverse? There is no sorting task that *requires* cmp. If all else fails you can define a key class to wrap the original wrapper such that comparing the keys simply calls the comparison function that you *would* have used. This has been discussed before and if you google sufficiently you should find the code that has been posted to do exactly that. -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
On Oct 1, 10:08 am, kj no.em...@please.post wrote: Challenge: to come up with a sorting task that cannot be achieved by passing to the sort method (or sorted function) suitable values for its key and reverse parameters, but instead *require* giving a value to its cmp parameter. If you're assuming a consistent sort-order (transitivity, not evolving over time, etc), then the cmp method and key method are mathematically equivalent (you could always do a compare sort first, record the order produced, and assign the position number as a key function): # constructive proof of the existence of a key function # corresponding to a given cmp function tmplist = sorted(s, cmp=mycmp) pos = dict(zip(map(id, tmplist), range(len(tmplist result = sorted(s, key=lambda x: pos[id(x)]) assert tmplist == result Given equivalence, what is really at stake is convenience. If you're starting point is a cmp function (for instance, asking a dating service member whether they prefer mate x to mate y), then having to convert to a key function can be inconvenient. If you need to sort by an ascending primary key and a descending secondary key, you may find it easiest to sort in two passes (taking advantage of guaranteed sort stability): sorted(s, key=secondary, reversed=True) sorted(s, key=primary) With a cmp function, the above could be achieved in a single pass: def mycmp(x, y): p = cmp(primary(a), primary(b)) return p if p else cmp(secondary(b), secondary(a)) sorted(s, cmp=mycmp) Also, as Paul pointed-out, there are some structures such as tree comparisons that may be challenging to express directly as a key function. As Carsten pointed-out, the cmp2key function allows cmp functions to trivially (though indirectly) be recast as key functions. All that being said, for many everyday uses, a key function is simpler to write and faster to run than a cmp function approach. Raymond -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Duncan Booth duncan.bo...@invalid.invalid writes: Is there a real-life sorting task that requires (or is far more efficient with) cmp and can't be easily achieved with key and reverse? There is no sorting task that *requires* cmp. If all else fails you can define a key class to wrap the original wrapper such that comparing the keys simply calls the comparison function that you *would* have used. I would count that as key being far less efficient, though still giving the same result. -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
In b8f7dea7-0fe3-4e25-9ffd-6796a6e2a...@a37g2000prf.googlegroups.com alex23 wuwe...@gmail.com writes: kj no.em...@please.post wrote: This example convinces me that it was a bad idea to get rid of cmp in Python 3, even if situations like this one are rare. It sounds like the entire point of this exercise was to get other people to confirm your bias for you. The only problem with this hypothesis is that my original bias was exactly the opposite to the one you quote: when I sent my original post I thought that cmp was in fact useless (after all I could not think of a situation that required it or even made it preferable), and was not even aware that it had been dropped in Python 3. Paul Rubin's post convinced me otherwise. BTW, what's with this business of ascribing underhanded motives to me? Earlier some other clown alleged that that my original post was homework??? WTF? kj -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Bearophile bearophileh...@lycos.com writes: Yes, think of sorting tree structures where you have to recursively compare them til you find an unequal pair of nodes. That's cute. In what situations do you have to perform such kind of sort? It came up in a search engine application I've been involved with. -- http://mail.python.org/mailman/listinfo/python-list