[Python-ideas] Optimizing list.sort() by checking type in advance

2016-10-10 Thread Elliot Gorokhovsky
Thanks for looking at this! That's why I spent months of my life (overall) devising a sequence of sorting algorithms for Python that reduced the number of comparisons needed. Yes, that's why I think this is so cool: for a couple dozen lines of code, we can get (at least for some cases, according

Re: [Python-ideas] Optimizing list.sort() by checking type in advance

2016-10-10 Thread Elliot Gorokhovsky
quotes from now on. On Mon, Oct 10, 2016 at 9:38 PM Chris Angelico wrote: > On Tue, Oct 11, 2016 at 2:29 PM, Elliot Gorokhovsky > wrote: > > Ya, I think this may be a good approach for floats: if the list is all > > floats, just copy all the floats into a seperate array, use t

Re: [Python-ideas] Optimizing list.sort() by checking type in advance

2016-10-10 Thread Elliot Gorokhovsky
co wrote: > On Tue, Oct 11, 2016 at 2:41 PM, Elliot Gorokhovsky > wrote: > > Oh no, the idea here is just you would copy over the floats associated > with > > the PyObject* and keep them in an array of such structs, so that we know > > which PyObject* are associat

Re: [Python-ideas] [Python-Dev] Optimizing list.sort() by checking type in advance

2016-10-10 Thread Elliot Gorokhovsky
On Mon, Oct 10, 2016 at 10:15 PM Guido van Rossum wrote: > But that's suspicious in itself -- since no comparisons are needed to > > sort a 1-element list, if it's still faster, there must be something > > else you're doing (or not doing) that's affecting the time measured. > Oh, ya. Duh. So th

[Python-ideas] INSANE FLOAT PERFORMANCE!!!

2016-10-11 Thread Elliot Gorokhovsky
Warning: the contents of this message may be dangerous for readers with heart conditions. So today I looked at PyFloat_RichCompare. I had been scared initially because it was so complicated, I was worried I would miss some important error check or something if I special cased it. So I took the fol

Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!

2016-10-11 Thread Elliot Gorokhovsky
So I got excited here. And the reason why is that I got those numbers *on Tim's benchmark*. When I got these kinds of numbers on my benchmarks, I figured there was probably a problem with they way I was timing, and certainly the gains couldn't be as extreme as they suggested. But this is on a bench

Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!

2016-10-11 Thread Elliot Gorokhovsky
quoting private email as a general > courtesy, but I hereby give you permission if it was mine that was private. > [Though I think your summary was better than a quote anyhow.] > > -jJ > > On Oct 11, 2016 4:58 PM, "Elliot Gorokhovsky" < > elliot.gorokhov...@gmail.co

Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!

2016-10-12 Thread Elliot Gorokhovsky
On Wed, Oct 12, 2016 at 9:20 AM Tim Peters wrote: > > What I'm *not* quite clear on is why Python 3's change to reject > > comparisons between unrelated types makes this optimisation possible. > > It doesn't. It would also apply in Python 2. I simply expect the > optimization will pay off more

Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!

2016-10-12 Thread Elliot Gorokhovsky
On Tue, Oct 11, 2016 at 9:56 PM Nick Coghlan wrote: > Once you get to the point of being able to do performance mentions on > a CPython build with a modified list.sort() implementation, you'll > want to take a look at the modern benchmark suite in > https://github.com/python/performance > Yup, t

Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!

2016-10-12 Thread Elliot Gorokhovsky
On Wed, Oct 12, 2016 at 5:36 AM Paul Moore wrote: > On 12 October 2016 at 11:16, Steven D'Aprano wrote: > > On Wed, Oct 12, 2016 at 12:25:16AM +, Elliot Gorokhovsky wrote: > > > >> Regarding generalization: the general technique for special-casing is > y

Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!

2016-10-12 Thread Elliot Gorokhovsky
On Wed, Oct 12, 2016 at 3:39 PM Nathaniel Smith wrote: > It looks like PyUnicode_Compare already has a special case to use > memcmp when both of the strings fit into latin1: > Wow! That's great! I didn't even try reading through unicode_compare, because I felt I might miss some subtle detail tha

Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!

2016-10-12 Thread Elliot Gorokhovsky
> So what's Python doing? Is it a codepoint ordering? > ...ya...how is the python interpreter supposed to know what language strings are in? There is a unique ordering of unicode strings defined by the unicode standard, AFAIK. If you want to sort by natural language ordering, see here: https://pyp

Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!

2016-10-12 Thread Elliot Gorokhovsky
On Wed, Oct 12, 2016 at 3:51 PM Nathaniel Smith wrote: > But this isn't relevant to Python's str, because Python's str never uses > UTF-8. > Really? I thought in python 3, strings are all unicode... so what encoding do they use, then? ___ Python-ideas

Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!

2016-10-12 Thread Elliot Gorokhovsky
at 4:08 PM Alexander Belopolsky < alexander.belopol...@gmail.com> wrote: > > On Wed, Oct 12, 2016 at 5:57 PM, Elliot Gorokhovsky < > elliot.gorokhov...@gmail.com> wrote: > > On Wed, Oct 12, 2016 at 3:51 PM Nathaniel Smith wrote: > > But this isn't relevant

Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!

2016-10-12 Thread Elliot Gorokhovsky
On Wed, Oct 12, 2016 at 4:26 PM Terry Reedy wrote: > I suspect that optimizing string sorting > will take some experimentation. If the initial item is str, it might be > worthwhile to record the highest 'kind' during the type scan, so that > strncmp can be used if all are ascii or latin-1. > My

[Python-ideas] Py_SIZE of PyLongs

2016-10-19 Thread Elliot Gorokhovsky
A quick note: I'm working on a special-case compare function for bounded integers for the sort stuff. By looking at the implementation, I figured out that Py_SIZE of a long is the sign times the number of digits (...right?). Before looking at the implementation, though, I had looked for this info

Re: [Python-ideas] Py_SIZE of PyLongs

2016-10-19 Thread Elliot Gorokhovsky
n the implementation I was able to figure out that Py_SIZE is interpreted as the sign times the number of digits (unless I'm missing something), but this should be in the docs IMO. On Wed, Oct 19, 2016 at 7:24 PM Thomas Nyberg wrote: > On 10/19/2016 09:04 PM, Elliot Gorokhovsky wrote: >

[Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-04 Thread Elliot Gorokhovsky
(Summary of results: my patch at https://bugs.python.org/issue28685 makes list.sort() 30-50% faster in common cases, and at most 1.5% slower in the uncommon worst case.) Hello all, You may remember seeing some messages on here about optimizing list.sort() by exploiting type-homogeneity: since com

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 10:51 AM David Mertz wrote: > Thanks for the hard work, this looks very promising. > Thank you! > Real world data tends to be mostly-sorted. So it would be useful for your > benchmarks to include: > > A) Performance on completely sorted data > i) Of homogenous type >

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 6:30 PM INADA Naoki wrote: > Good job. I'll read your work later. > Thanks! So please don't use string-key dict specialization to rationalize > list-sort specialization. > I'm not quite doing that: I agree that dict is a special case because of internal use. I'm just sa

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
pending on whether the problem raised in the previous paragraph can be addressed. What do you think? On Sun, Mar 5, 2017 at 7:47 PM Steven D'Aprano wrote: On Sun, Mar 05, 2017 at 07:19:43AM +, Elliot Gorokhovsky wrote: > You may remember seeing some messages on here about optimi

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 7:50 PM Chris Angelico wrote: > > I would be rather curious to know how frequently a list consists of > "numbers", but a mix of ints and floats. Does it happen a > lot in real-world code? > > This is of course undecidable to verify statically, so we can't just crawl PyPI...

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 8:10 PM Chris Angelico wrote: > > Also, the performance hit is so small, and even that is in the very > worst case (a homogeneous list with one different type at the end). I > like changes that make stuff run faster. > I agree. _

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 8:20 PM David Mertz wrote: > > B. I think a very large percentage of lists are heterogeneous. But most > of the time when they are, it's not because there are several different > numeric types but rather because a list collects some sort of custom > objects. Maybe those e

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 8:09 PM David Mertz wrote: > > If we added __type_hint__ as None/type-object and added those comparisons > to it on .insert()/.append()/etc, then we would be slower by some increment > while all we were doing was adding things. There could only be a win when > the list is

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 8:23 PM David Mertz wrote: > > But also, it's not just the number of list objects that are sorted, but > how often it's done. I could have a 10,000 line program with only one call > to `my_list.sort()` in it... but that one line is something that is called > inside an inne

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 8:29 PM Nick Timkovich wrote: > I see the benchmarks, and while I assume the asymptotic complexity is the > same, is there a longer "start-up" time your optimizations need? Do you > have benchmarks where you sort 10, 100...10**6 items to show that beyond > the types you're

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 9:12 PM MRAB wrote: > > Although it's true that both programmers and Python might treat 10 as > functionally identical to 10.0, in practice the numbers that are being > added to the list probably come from some code that returns integers > /or/ floats, rather than a mixture

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 9:25 PM Tim Peters wrote: > > Would someone please move the patch along? I expect it's my fault it's > languished so long, since I'm probably the natural person to review it, but > I've been buried under other stuff. > > But the patch doesn't change anything about the sort

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 10:25 PM David Mertz wrote: > > Could we make the file global a table of comparison functions and have > each thread reference a position in the table? It would be fine if multiple > threads happened to use the position of e.g. the int comparison, just so > long as each cho

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 10:45 PM Elliot Gorokhovsky < elliot.gorokhov...@gmail.com> wrote: > > the problem is, how can we get a unique identifier for the thread in a > platform-independent way? Any ideas? > Oh, I could probably just copy over code from threading.get_ident()... n

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
Another solution: check if there is more than one thread; if there is, then disable optimization. Is sorting in multithreaded programs common enough to warrant adding the complexity to deal with it? On Sun, Mar 5, 2017 at 10:52 PM Elliot Gorokhovsky < elliot.gorokhov...@gmail.com> wrote:

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 11:21 PM Jelle Zijlstra wrote: > > I think using a global is unsafe even without multithreading, because > the compare function itself could end up doing list.sort() (it's > calling arbitrary Python code after all). > Right, of course. So, clearly, the only safe solution i

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 11:39 PM INADA Naoki wrote: > > I think there is another safe solution: Gave up unsafe_object_compare. > > Compare function of long, float, and unicode must not call list.sort(), > and must not release GIL. > So all you need is, backup old compare_function before sort, and

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 11:31 PM Tim Peters wrote: > > The best approach is indeed to pass the function pointer to every > location that needs it. Note that a MergeState struct is already > created per sort invocation, That isn't a file global for much the > same reason. > Right. It's a real sha

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-06 Thread Elliot Gorokhovsky
On Mon, Mar 6, 2017, 4:42 PM Jim J. Jewett wrote: (1) Good Job. Thanks! (3) Ideally, your graph would have the desired-to-be lines after the as-is lines; for English writing, that would mean putting your (short red) lines to the right of the (tall blue) lines. (4) I suspect colors other t

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-07 Thread Elliot Gorokhovsky
On Tue, Mar 7, 2017 at 1:47 PM Erik wrote: > > I'd prefer the sort optimization to be based on what my list contains > NOW, not on what it may have contained some time in the past, so I'm not > a fan of the "once heterogeneous, always considered heterogeneous" > behaviour if it's cheap enough to

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-07 Thread Elliot Gorokhovsky
On Tue, Mar 7, 2017 at 4:53 PM David Mertz wrote: > > > In [22]: class Eq(int): > def __eq__(self, other): > return True >: > In [23]: four, five, six = Eq(4), Eq(5), Eq(6) > In [24]: lst = [four, five, six] > In [25]: lst.count(Eq(7)) > Out[25]: 3 > > > How would this work (o

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-08 Thread Elliot Gorokhovsky
On Wed, Mar 8, 2017 at 2:14 PM Barry wrote: > Can you assume that list of of type(list[0]) and use that type's optimised > sort? > But in the optimised sort code check that the types are as required. > If you hit an element that is not of the required type then fall back to > the unoptimised sort

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-09 Thread Elliot Gorokhovsky
On Thu, Mar 9, 2017 at 3:04 AM Barry Scott wrote: > > So you do O(nlogn)*2 pointer compares with my suggestion it seems? Which > is smaller the O(n) pointer checks? > Not sure what you mean here... pretty sure your inequality is backwards? Look -- the point is, we already *do* the pointer check

[Python-ideas] Submitted a PR!

2017-03-09 Thread Elliot Gorokhovsky
Just submitted a PR implementing this: https://github.com/python/cpython/pull/582 -- just need someone to review it now :) Thanks for all your feedback, everyone! On Sun, Mar 5, 2017 at 12:19 AM Elliot Gorokhovsky < elliot.gorokhov...@gmail.com> wrote: > (Summary of results: my patch

Re: [Python-ideas] Submitted a PR!

2017-03-09 Thread Elliot Gorokhovsky
There is an "apple to apple" compare function. It's unsafe_object_compare. If you look at the pre-sort check, you will find that if the list is homogeneous but not of float, int, string, or tuple, it sets compare_funcs.key_richcompare = ob_type->tp_richcompare and sets compare_funcs.key_compare = u