[Numpy-discussion] Fwd: Reverse(DESC)-ordered sorting

2015-08-19 Thread Feng Yu
Dear list,

This is forwarded from issue 6217 https://github.com/numpy/numpy/issues/6217

What is the way to implement DESC ordering in the sorting routines of numpy?

(I am borrowing DESC/ASC from the SQL notation)

For a stable DESC ordering sort, one can not  revert the sorted array via
argsort()[::-1] .

I propose the following API change to argsorts/sort. (haven't thought
about lexsort yet) I will use argsort as an example.

Currently, argsort supports sorting by keys ('order') and by 'axis'.
These two somewhat orthonal interfaces need to be treated differently.

1. by axis.

Since there is just one sorting key, a single 'reversed' keyword
argument is sufficient:

a.argsort(axis=0, kind='merge', reversed=True)

Jaime suggested this can be implemented efficiently as a
post-processing step.
(https://github.com/numpy/numpy/issues/6217#issuecomment-132604920) Is
there a reference to the algorithm?

Because all of the sorting algorithms for 'atomic' dtypes are using
_LT functions, a post processing step seems to be the only viable
solution, if possible.


2. by fields, ('order' kwarg)

A single 'reversed' keyword argument will not work, because some keys
are ASC but others are DESC, for example, sorting my LastName ASC,
then Salary DESC.

a.argsort(kind='merge', order=[('LastName', ('FirstName', 'asc'),
('Salary', 'desc'))])

The parsing rule of order is:

- if an item is tuple, the first item is the fieldname, the second
item is DESC/ASC
- if an item is scalar, the fieldname is the item, the ordering is ASC.

This part of the code already goes to VOID_compare, which walks a
temporary copy of a.dtype to call the comparison functions.

If I understood the purpose of c_metadata (numpy 1.7+) correctly, the
ASC/DESC flags, offsets, and comparison functions can all be
pre-compiled and passed into VOID_compare via c_metadata of the
temporary type-descriptor.

By just looking this will actually make VOID_compare faster by
avoiding calling several Python C-API functions. negating the return
value of cmp seems to be a negligable overhead in such a complex
function.

3. If both reversed and order is given, the ASC/DESC fields in 'order'
are effectively reversed.

Any comments?

Best,

Yu
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: Reverse(DESC)-ordered sorting

2015-08-19 Thread Jaime Fernández del Río
On Wed, Aug 19, 2015 at 1:10 PM, Feng Yu rainwood...@gmail.com wrote:

 Dear list,

 This is forwarded from issue 6217
 https://github.com/numpy/numpy/issues/6217

 What is the way to implement DESC ordering in the sorting routines of
 numpy?

 (I am borrowing DESC/ASC from the SQL notation)

 For a stable DESC ordering sort, one can not  revert the sorted array via
 argsort()[::-1] .

 I propose the following API change to argsorts/sort. (haven't thought
 about lexsort yet) I will use argsort as an example.

 Currently, argsort supports sorting by keys ('order') and by 'axis'.
 These two somewhat orthonal interfaces need to be treated differently.

 1. by axis.

 Since there is just one sorting key, a single 'reversed' keyword
 argument is sufficient:

 a.argsort(axis=0, kind='merge', reversed=True)

 Jaime suggested this can be implemented efficiently as a
 post-processing step.
 (https://github.com/numpy/numpy/issues/6217#issuecomment-132604920) Is
 there a reference to the algorithm?


My thinking was that, for native types, you can stably reverse a sorted
permutation in-place by first reversing item-by-item, then reversing every
chunk of repeated entries. Sort of the way you would reverse the words in a
sentence in-place: first reverse every letter, then reverse everything
bounded by spaces:

TURN ME AROUND - DNUORA EM NRUT - AROUND EM NRUT - AROUND ME NRUT -
AROUND ME TURN

We could even add a type-specific function to do this for each of the
native types in the npy_sort library.

As I mentioned in Yu's very nice PR
https://github.com/numpy/numpy/pull/6222, probably it is best to leave
the signature of the function alone, and have something like order='desc'
be the trigger for the proposed reversed=True.

Jaime



 Because all of the sorting algorithms for 'atomic' dtypes are using
 _LT functions, a post processing step seems to be the only viable
 solution, if possible.



 2. by fields, ('order' kwarg)

 A single 'reversed' keyword argument will not work, because some keys
 are ASC but others are DESC, for example, sorting my LastName ASC,
 then Salary DESC.

 a.argsort(kind='merge', order=[('LastName', ('FirstName', 'asc'),
 ('Salary', 'desc'))])

 The parsing rule of order is:

 - if an item is tuple, the first item is the fieldname, the second
 item is DESC/ASC
 - if an item is scalar, the fieldname is the item, the ordering is ASC.

 This part of the code already goes to VOID_compare, which walks a
 temporary copy of a.dtype to call the comparison functions.

 If I understood the purpose of c_metadata (numpy 1.7+) correctly, the
 ASC/DESC flags, offsets, and comparison functions can all be
 pre-compiled and passed into VOID_compare via c_metadata of the
 temporary type-descriptor.

 By just looking this will actually make VOID_compare faster by
 avoiding calling several Python C-API functions. negating the return
 value of cmp seems to be a negligable overhead in such a complex
 function.


 3. If both reversed and order is given, the ASC/DESC fields in 'order'
 are effectively reversed.

 Any comments?

 Best,

 Yu
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion