Dennis Sweeney <[email protected]> added the comment:
If we were to do this, I think a better API might be to accept an arbitrary
iterable, then produce a sorted iterable:
def sorted_on_disk(iterable, key=None, reverse=False) -> Iterable:
...
It would sort chunks of the input and store them in files as sequences of
pickles, merging them as they got bigger, and then return an iterator over the
resulting single sorted file.
This would be more composable with other standard Python functions and would be
a good way of separating concerns. sorted(...) and heapq.merge(...) already
have the correct APIs to do it this way.
Potential implementation detail: For some small fixed n, always doing a 2^n-way
heapq.merge instead of a bunch of 2-way merges would do fewer passes over the
data and would allow the keys to be computed 1/n as many times, assuming we
wouldn't decorate-sort-undecorate.
----------
nosy: +Dennis Sweeney
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue41678>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com