[issue35010] sort by partially reversed key tuple

2018-10-18 Thread Cyker Way


Cyker Way  added the comment:

Thank you very much for the great discussion here, especially Tim's great 
threads in *python-ideas* that give neat and insightful answers to this problem 
in different ways:

-   

-   

Since this topic is closed, future discussions probably should go to other 
python forums. But it might be good to draw some conclusions here for future 
reference.

First of all, either single-pass sort with a vector key or multi-pass sort with 
a scalar key may work better, depending on the input. However, in most cases, 
using multi-pass sort for such problem is the right way to go in the current 
python implementation. The multi-pass sort algorithm typically runs 2x faster 
or so than a single-pass sort algorithm. This is likely due to constants rather 
than asymptotic complexity. But when measured in real time, multi-pass sort 
algorithm clearly wins in most cases.

If your input is so special that it aligns much better with single-pass sort 
algorithms (overwhelming the constant advantage of multi-pass sort algorithm), 
you may use a single-pass sort algorithm. But there are actually different ways 
of implementing so. The algorithm posted in Tim's second thread on python-ideas 
is in fact different from mine in this bug thread, where Tim used a wrapper 
class for the keys and I used a wrapper class for the scalars. Since there are 
`n` keys but can be as many as `n * m` scalars, my method would be using more 
wrapper objects. So I expected it to run slower than Tim's. To my surprise, 
sometimes it works better. The reason is later found to be easy to understand: 
Wrapper objects are created only when a column needs to be reversed. The number 
of columns to be reversed is actually a factor controlling the running time. To 
give a fair evaluation, I created 50 random rows with 8 columns and control 
the number of columns to be reversed. The evaluation s
 hows my algorithm could have running time 80%-120% that of Tim's (excluding 
the special case where no column needs to be reversed). This shows a new 
direction of optimizing such algorithms.

Finally, this issue was rejected because the added benefits were deemed not 
enough to complicate the implementation. Considering the current multi-pass 
sort algorithm has a huge advantage and is easy to implement in python, this 
decision is fair. Users who care less about performance may write a key adapter 
in their own code if they want to stick with builtin sort functions. Users who 
do care about performance can use single-pass sort techniques mentioned in this 
issue in case multi-pass sort doesn't work well with their data.

--
Added file: https://bugs.python.org/file47880/performance-2.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35010] sort by partially reversed key tuple

2018-10-17 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

Would be worth to add a wrapper in functools which revert the sorting order?

class reverted_order:
def __init__(self, value):
self.value = value
def __lt__(self, other):
if isinstance(other, reverted_order):
other = other.value
return self.value.__ge__(other)
# __le__, __gt__, __ge__, __eq__, __ne__, __hash__

Then you could use key=lambda x: (x['url'], reverted_order(x['user'])).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35010] sort by partially reversed key tuple

2018-10-17 Thread Karthikeyan Singaravelan


Karthikeyan Singaravelan  added the comment:

Sorry to comment on the closed issue. I think the multisort recipe [0] is 
pretty neat can be added to the sorting howto page instead of being lost in the 
mailing list thread. It was suggested for addition later in the thread but 
never got added.

I guess I will open up a separate issue for the open PR and possibly add the 
recipe with the PR if it's okay. Thoughts?

Thanks much for the explanation Tim :)

[0] https://mail.python.org/pipermail/python-ideas/2016-October/043045.html

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35010] sort by partially reversed key tuple

2018-10-17 Thread Tim Peters


Tim Peters  added the comment:

This comes up every few years, but that's about it.  Here's the iteration from 
2 years ago:

https://mail.python.org/pipermail/python-ideas/2016-October/043039.html

Follow the thread.  It contains easy-to-use wrappers for both "do it in 
multiple simple passes" and "do it in one messy pass" approaches.  It's 
impossible for an implementation to guess in advance which will be faster - it 
depends on the data, which only the user can know about in advance.  There's 
nothing the implementation could do to improve O() behavior regardless.

If there were "real" demand for this, someone by now would have packaged those 
wrappers and made them available on PyPI.  Since that seems not to have 
happened, I agree with Raymond rejecting this idea for the core at this time.  
There would be a higher-than-usual bar for that anyway, because the sorting 
code is already highly complex, and building in the wrappers would be a 
tedious, long-winded exercise in recoding in C what's _easily_ coded in Python 
already.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35010] sort by partially reversed key tuple

2018-10-17 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35010] sort by partially reversed key tuple

2018-10-17 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

Cyker, thank you for the suggestion, but we're going to decline.   The sorting 
HOWTO docs show how to exploit sort stability with multiple passes to handle a 
mix of ascending and descending steps.  It would complicate the API to have an 
array of key functions, each with their own ascending and descending flag. 
Likewise, it would also complicate the implementation.  

The added complexity just isn't worth it when we already have a reasonable 
solution and when the use case itself isn't very common.

--
assignee:  -> rhettinger
nosy: +rhettinger
resolution:  -> rejected

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35010] sort by partially reversed key tuple

2018-10-17 Thread Steven D'Aprano


Steven D'Aprano  added the comment:

> And you are free to use whatever sorting algorithms in its implementation for 
> this kind of task.

That's very kind of you *wink*

At this point, I don't think there's much more point to discussing this further 
until Tim Peters weighs in and lets us know what he thinks. If he loves the 
idea, and is able to implement it, it may happen; if he is luke-warm or hates 
it, probably not.

(My own *personal* view is that unless there is an obvious and consistent 
performance win from adding this, adding the extra complexity to both the sort 
implementation and the interface isn't worth the effort. Not every simple 
helper function needs to be a builtin.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35010] sort by partially reversed key tuple

2018-10-17 Thread Cyker Way


Cyker Way  added the comment:

As for performance, I think both single-pass and multi-pass sorts have 
worst-case time complexity `m * n * log(n)`, assuming the number of items is 
`n` and each item has dimension `m`. Whichever is faster seems to be 
data-dependent. So I made a more comprehensive performance evaluation in 
`performance-1.py`. It turns out either single-pass or multi-pass can be 
80-100% faster than the other, on different inputs.

Since single-pass and multi-pass sorts are good at different inputs and there 
is currently no statistics on real application data supporting either, both 
look OK to me. But I hope you can add something in the interface of sorting 
functions (mainly `sorted` and `list.sort`) so that users don't have to write 
that multi-pass sort again and again. If the `keyspec` format is deemed too 
complicated, `keys` and `reverses` also look good to me:

sorted(items, keys=(key0, key1, key2), reverses=(True, False, True))

And you are free to use whatever sorting algorithms in its implementation for 
this kind of task.

--
Added file: https://bugs.python.org/file47877/performance-1.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35010] sort by partially reversed key tuple

2018-10-17 Thread Cyker Way


Cyker Way  added the comment:

Previous upload `example.py` was missing `__eq__`. Updated in `example-1.py`.

--
Added file: https://bugs.python.org/file47876/example-1.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35010] sort by partially reversed key tuple

2018-10-17 Thread Karthikeyan Singaravelan


Change by Karthikeyan Singaravelan :


--
keywords: +patch
pull_requests: +9282
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35010] sort by partially reversed key tuple

2018-10-17 Thread Steven D'Aprano


Steven D'Aprano  added the comment:

The ``sorted`` docs links to the Sorting HOWTO, the ``list.sort`` docs should 
do the same.

https://docs.python.org/3/library/functions.html#sorted

https://docs.python.org/3/library/stdtypes.html#list.sort

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35010] sort by partially reversed key tuple

2018-10-17 Thread Cyker Way


Cyker Way  added the comment:

Multi-pass stable sorts should produce the correct result. But as the number of 
columns grow the code gets messy. For brevity this example only has 2 columns 
but it may be 10 or more in a real application. Furthermore, in some cases the 
application may need to remember which columns are sorted asc/desc so you have 
to keep that piece of data (termed `keyspec` but can be any meaningful name) 
anyway. It would be ideal if a builtin function can directly operate on this 
data instead of manually writing a multi-pass sorting logic;

The proposed prototype (`c.py`) is aimed at minimizing the amount of code 
needed to illustrate the problem and is not optimized for performance. There is 
a performance hit where it wraps one object into another, but this should be 
able to be avoided. If you want to compare real performance between single- and 
multi-pass sorts, you can run this script `performance.py` instead. My test 
result is that multi-pass sorting takes 75% more time, YMMV.

--
Added file: https://bugs.python.org/file47875/performance.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35010] sort by partially reversed key tuple

2018-10-17 Thread Karthikeyan Singaravelan


Karthikeyan Singaravelan  added the comment:

@Serhiy I just checked the docs to add an example and it's explained with an 
example at [0] :)


```

Sorts are guaranteed to be stable. That means that when multiple records have 
the same key, their original order is preserved.

This wonderful property lets you build complex sorts in a series of sorting 
steps. For example, to sort the student data by descending grade and then 
ascending age, do the age sort first and then sort again using grade:

>>> s = sorted(student_objects, key=attrgetter('age')) # sort on secondary 
>>> key
>>> sorted(s, key=attrgetter('grade'), reverse=True)   # now sort on 
>>> primary key, descending
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

The Timsort algorithm used in Python does multiple sorts efficiently because it 
can take advantage of any ordering already present in a dataset.

```

As noted TimSort makes this efficient by taking advantage of the sorting in the 
first run though it was not done as a single pass. If I am bench marking 
correctly in the attached file then for a list with 400 items using multiple 
pass sort is 4-5x faster than the suggested sorting in Python 3.7 and also more 
readable (though my lambda call is messy :)

Multiple pass sort :
3.871509724
Suggested sort :
17.237952677


[0] 
https://docs.python.org/3/howto/sorting.html#sort-stability-and-complex-sorts

--
Added file: https://bugs.python.org/file47874/bpo35010_1.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35010] sort by partially reversed key tuple

2018-10-17 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

Would be nice to add an example in the documentation.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35010] sort by partially reversed key tuple

2018-10-17 Thread Karthikeyan Singaravelan


Karthikeyan Singaravelan  added the comment:

There was some discussion about it : 
https://lists.gt.net/python/python/539896#539896 . As suggested by Raymond in 
the thread the below can be used to get the desired output

items.sort(key=lambda r: r['user'], reverse=True)
items.sort(key=lambda r: r['url'])

--
nosy: +xtreak

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35010] sort by partially reversed key tuple

2018-10-17 Thread Steven D'Aprano


Steven D'Aprano  added the comment:

Since sort is guaranteed to be stable, can't you sort in two runs?


py> values = ['bc', 'da', 'ba', 'abx', 'ac', 'ce', 'dc', 'ca', 'aby']
py> values.sort(key=itemgetter(1), reverse=True)
py> values.sort(key=itemgetter(0))
py> values
['ac', 'abx', 'aby', 'bc', 'ba', 'ce', 'ca', 'dc', 'da']


Its not as efficient as doing a single sort, but its easier to understand than 
a complex API to specify each item's sort direction individually, and therefore 
probably less likely to mess it up and get it wrong.

--
nosy: +steven.daprano, tim.peters

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35010] sort by partially reversed key tuple

2018-10-17 Thread Cyker Way


New submission from Cyker Way :

The current `sorted` function is somewhat limited and doesn't cover a use case 
that frequently occurs in real applications: sort by a tuple of keys where each 
key can be in asc or desc order.

For example, you may have a list of site configs where each of specify which 
user on which site gets how much quota. This data may be loaded from a SQL 
table where there are 3 columns: url, user, quota. Often you may want to sort 
by url, but when you want to check which users have been allocated most quota, 
you probably want to sort by quota. Even more likely, when you are sorting by 
quota, you still want to sort by url for those having the same quota.

Unfortunately, current `sorted` function doesn't allow you to sort by desc 
quota and asc url at the same time, because the `reverse` parameter acts on the 
final result, regardless of columns. For numeric columns, there is a trick of 
using minus sign on the data. But this approach usually doesn't apply to other 
types of data.

The general solution is to enhance the key function, allowing users to specify 
each column to be considered, with an asc/desc flag:

keyspec = [
(itemgetter('url'), False),
(itemgetter('user'), True),
]

An example is worth 1k words. The full example is provided in the attachment.

It's not a lot of code to write but making this feature builtin can save a lot 
of work since sorting a SQL table or other sheet data is quite common in real 
applications.

--
components: Library (Lib)
files: c.py
messages: 327886
nosy: cykerway
priority: normal
severity: normal
status: open
title: sort by partially reversed key tuple
type: enhancement
versions: Python 3.8
Added file: https://bugs.python.org/file47873/c.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com