New submission from Ruben Vorderman <r.h.p.vorder...@lumc.nl>:

Python now uses the excellent timsort for most (all?) of its sorting. But this 
is not the fastest sort available for one particular use case.

If the number of possible values in the array is limited, it is possible to 
perform a counting sort: https://en.wikipedia.org/wiki/Counting_sort. In a 
counting sort each value maps to an integer corresponding to its relative 
value. The values are then counted by using key = map_to_key(value); 
count_array[key]. Then from this count_array a new array of sorted values can 
be constructed. (See the wikipedia article for a more detailed explanation).

For the python bytes and bytesarray types this is extremely simple to 
implement. All 256 possible values are can be directly used as keys. 

Rough implementation:
- Use buffer protocol to get pointer to bytes/bytesarray internal c-string
- Declare a count_array: Py_ssize_t[256] count_array . (use memset to set it to 
0)
- Iterate over the c-string and add each value to the countarray. 
count_array[buffer[i]] += 1
- Allocate a new bytes(array) object, or in the case of bytesarray the sorting 
can be performed inplace when bytesarray.sort() is used. 
- Iterate over the count_array. Get the number of values for each key and use 
memset to insert the sequences of keys into position.


The most obvious way to implement this algorithm will be as bytesarray.sort() 
where it is sorted inplace and as bytes.sort() which returns a new sorted bytes 
object. This is much much faster than using bytes(sorted(bytes)).

I made a quick cython implementation for speed testing here: 
https://github.com/rhpvorderman/bytes_sort/blob/main/bytes_sort.pyx

Currently to get a sorted bytestring one has to do bytes(sorted(my_bytes)). 

Test results:

# First make sure there is no regression when sorting an empty string
$ python -m timeit -c "from bytes_sort import bytes_sort" "bytes(sorted(b''))"
500000 loops, best of 5: 560 nsec per loop
$ python -m timeit -c "from bytes_sort import bytes_sort" "bytes_sort(b'')"
500000 loops, best of 5: 565 nsec per loop

# Test result for very small strings
$ python -m timeit -c "from bytes_sort import bytes_sort" 
"bytes(sorted(b'abc'))"
500000 loops, best of 5: 628 nsec per loop
$ python -m timeit -c "from bytes_sort import bytes_sort" "bytes_sort(b'abc')"
500000 loops, best of 5: 578 nsec per loop

# Even on a very small already sorted string, a counting sort is faster.

# Test with a proper string
$ python -m timeit -c "from bytes_sort import bytes_sort" 
"bytes(sorted(b'Let\'s test a proper string now. One that has some value to be 
sorted.'))"
100000 loops, best of 5: 2.32 usec per loop
$ python -m timeit -c "from bytes_sort import bytes_sort" "bytes_sort(b'Let\'s 
test a proper string now. One that has some value to be sorted.')"
500000 loops, best of 5: 674 nsec per loop

More than three times faster!

----------
components: C API
messages: 407032
nosy: rhpvorderman
priority: normal
severity: normal
status: open
title: Bytes and bytesarrays can be sorted with a much faster count sort.
type: performance
versions: Python 3.11

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue45902>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to