On Mon, Apr 08, 2019 at 10:34:19AM +0100, Paul Moore wrote: > On Mon, 8 Apr 2019 at 10:10, Steven D'Aprano <st...@pearwood.info> wrote: > > > Possibly the maintainer of bisect may decide that its not worth the > > change. But for the statistics module, I would certainly change the > > implementation of median() to look something vaguely like this: > > > > # was > > data = sorted(data) # may be expensive if data is large > > > > # may become > > if not (isinstance(data, list) and data.__issorted__): > > data = sorted(data) > > So just to be clear, this would be a change in behaviour - for a list > that is currently sorted on a key that is *not* "numerically > ascending", the code will no longer re-sort, and so will give the > wrong answer?
Correct. A thought comes to mind... Currently the two variant forms of median (median_low and _high) can work with non-numeric data: # don't take this example too seriously py> statistics.median_low(['low', 'high', 'high', 'low', 'very low']) 'low' so perhaps there is a use-case for non-numeric sorts. > I'm not saying that this is forbidden, just want to be clear if that's > what you mean (because my difficulty with the proposed attribute is > that it seems unreliable enough that I can't imagine a case where I'd > feel comfortable using it myself). Fair enough, but does it help you feel a bit better about the feature if we called it a "sort hint", and emphasized that it should only be used in the same sort of APIs where you might allow the caller to pass already_sorted=True to skip a redundant sort step? > PS I thought timsort was highly efficient given already sorted data? > Whether it's OK to rely on that I don't know, but did you take that > into account? Yes, timsort is very efficient with already sorted data, and I have :-) On my computer, to sort 50 million integers in place takes about 13.5 seconds; to sort it the next time takes 2.7 seconds. (That's effectively just galloping along the list, looking for out-of-place items and not finding any.) To anyone tempted to say "Get a better computer", if I had a better computer I'd be using a list of 50 billion integers and it would still take 2.7 seconds :-) -- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/