Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-10 Thread Koos Zevenhoven
On Mon, Mar 6, 2017 at 4:45 AM, Steven D'Aprano wrote: > On Sun, Mar 05, 2017 at 07:19:43AM +, Elliot Gorokhovsky wrote: > >> You may remember seeing some messages on here about optimizing list.sort() >> by exploiting type-homogeneity: since comparing apples and oranges

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-09 Thread Tim Peters
[Tim Delaney ] > Isn't there already always a scan of the iterable to build the keys array > for sorting (even if no key keyword param is specified)? No - `.sort()` is a list method, and as such has nothing to do with arbitrary iterables, just lists (perhaps you're

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-09 Thread Chris Barker
On Thu, Mar 9, 2017 at 2:04 AM, Barry Scott wrote: > It was not clear to me that you need to scan the list at the start to make > sure its homogeneous. Given that the type check is so cheap will it > slow the sort if you do the pointer check in the compare code? I am not

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-09 Thread Tim Delaney
On 9 March 2017 at 21:04, Barry Scott wrote: > It was not clear to me that you need to scan the list at the start to make > sure its homogeneous. Given that the type check is so cheap will it > slow the sort if you do the pointer check in the compare code? I am not >

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-09 Thread Elliot Gorokhovsky
On Thu, Mar 9, 2017 at 3:04 AM Barry Scott wrote: > > So you do O(nlogn)*2 pointer compares with my suggestion it seems? Which > is smaller the O(n) pointer checks? > Not sure what you mean here... pretty sure your inequality is backwards? Look -- the point is, we

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-09 Thread Barry Scott
> On 8 Mar 2017, at 22:58, Elliot Gorokhovsky > wrote: > > On Wed, Mar 8, 2017 at 2:14 PM Barry > wrote: > Can you assume that list of of type(list[0]) and use that type's optimised > sort? > But in the

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-08 Thread Chris Barker
On Wed, Mar 8, 2017 at 2:58 PM, Elliot Gorokhovsky < elliot.gorokhov...@gmail.com> wrote: > > The whole point of my patch is that we do O(nlogn) compares, but only have > O(n) elements, so it's much cheaper to do all the type checks in advance, > > I mean, practically speaking, the advance

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-08 Thread Elliot Gorokhovsky
On Wed, Mar 8, 2017 at 2:14 PM Barry wrote: > Can you assume that list of of type(list[0]) and use that type's optimised > sort? > But in the optimised sort code check that the types are as required. > If you hit an element that is not of the required type then fall back

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-08 Thread Erik
On 08/03/17 11:07, Steven D'Aprano wrote: I mentioned earlier that I have code which has to track the type of list items, and swaps to a different algorithm when the types are not all the same. Hmmm. Yes, I guess if the expensive version requires a lot of isinstance() messing or similar for

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-08 Thread Steven D'Aprano
On Wed, Mar 08, 2017 at 01:20:19AM +, Erik wrote: > >Part of the complexity here is that I'd like this flag to be available > >to Python code, not just a hidden internal state of the list. > > Out of interest, for what purpose? Generally, I thought Python code > should not need to worry

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-07 Thread Steven D'Aprano
On Tue, Mar 07, 2017 at 09:10:00PM +, Elliot Gorokhovsky wrote: > I think the bigger problem, though, is that most list use does *not* > involve sorting, There are other uses for a type hint apart from sorting. There may be optimized versions of other functions (sum, math.fsum, ...) and

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-07 Thread Steven D'Aprano
On Tue, Mar 07, 2017 at 08:46:45PM +, Erik wrote: > I don't think anyone has mentioned this yet, but FWIW I think the 'type > hint' may need to be tri-state: heterogeneous (NULL), homogeneous (the > pointer to the type structure) and also "unknown" (a sentinel value - > the address of a

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-07 Thread David Mertz
On Tue, Mar 7, 2017 at 4:36 PM, Erik wrote: > * listextend() - this should do the right thing with the type hint when >> extending one list with another. >> > > * Several other methods ('contains', 'remove', 'count', 'index') also use > PyObject_RichCompareBool(). They

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-07 Thread Elliot Gorokhovsky
On Tue, Mar 7, 2017 at 1:47 PM Erik wrote: > > I'd prefer the sort optimization to be based on what my list contains > NOW, not on what it may have contained some time in the past, so I'm not > a fan of the "once heterogeneous, always considered heterogeneous" >

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-07 Thread Erik
On 07/03/17 20:46, Erik wrote: (unless it was acceptable that once heterogeneous, a list is always considered heterogeneous - i.e., delete always sets the hint to NULL). Rubbish. I meant that delete would not touch the hint at all. E. ___

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-06 Thread Chris Barker
On Sun, Mar 5, 2017 at 6:45 PM, Steven D'Aprano wrote: > I sometimes need to know if a list is homogenous, but unfortunately > checking large lists for a common type in pure Python is quote slow. > > Here is a radical thought... why don't lists track their common type >

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-06 Thread Terry Reedy
On 3/6/2017 1:56 AM, Elliot Gorokhovsky wrote: P.S. Is it OK if I close my current issue on the bug tracker and open a new issue, where I'll post the revised patch? The writing on my current issue uses the old, less-rigorous benchmarks, and I feel it would be less confusing if I just made a new

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 11:31 PM Tim Peters wrote: > > The best approach is indeed to pass the function pointer to every > location that needs it. Note that a MergeState struct is already > created per sort invocation, That isn't a file global for much the > same reason. >

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Chris Angelico
On Mon, Mar 6, 2017 at 5:52 PM, Tim Peters wrote: > [Chris Angelico ] >> Arbitrary comparison functions let you do anything but whoa, I >> cannot imagine any way that this would ever happen outside of "hey >> look, here's how you can trigger a

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Tim Peters
[Chris Angelico ] > Arbitrary comparison functions let you do anything but whoa, I > cannot imagine any way that this would ever happen outside of "hey > look, here's how you can trigger a SystemError"! CPython is full of defensive code protecting against malicious crap.

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 11:39 PM INADA Naoki wrote: > > I think there is another safe solution: Gave up unsafe_object_compare. > > Compare function of long, float, and unicode must not call list.sort(), > and must not release GIL. > So all you need is, backup old

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Jelle Zijlstra
2017-03-05 22:08 GMT-08:00 Elliot Gorokhovsky : > Another solution: check if there is more than one thread; if there is, then > disable optimization. Is sorting in multithreaded programs common enough to > warrant adding the complexity to deal with it? > I think

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 10:45 PM Elliot Gorokhovsky < elliot.gorokhov...@gmail.com> wrote: > > the problem is, how can we get a unique identifier for the thread in a > platform-independent way? Any ideas? > Oh, I could probably just copy over code from threading.get_ident()... not sure if the

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 10:25 PM David Mertz wrote: > > Could we make the file global a table of comparison functions and have > each thread reference a position in the table? It would be fine if multiple > threads happened to use the position of e.g. the int comparison, just so

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 9:25 PM Tim Peters wrote: > > Would someone please move the patch along? I expect it's my fault it's > languished so long, since I'm probably the natural person to review it, but > I've been buried under other stuff. > > But the patch doesn't change

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Tim Peters
[Elliot Gorokhovsky ] > (Summary of results: my patch at https://bugs.python.org/issue28685 makes list.sort() 30-50% > faster in common cases, and at most 1.5% slower in the uncommon worst case.) > ... Would someone please move the patch along? I expect it's my

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 9:12 PM MRAB wrote: > > Although it's true that both programmers and Python might treat 10 as > functionally identical to 10.0, in practice the numbers that are being > added to the list probably come from some code that returns integers > /or/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread MRAB
On 2017-03-06 03:09, Chris Angelico wrote: On Mon, Mar 6, 2017 at 2:03 PM, Elliot Gorokhovsky wrote: On Sun, Mar 5, 2017 at 7:50 PM Chris Angelico wrote: I would be rather curious to know how frequently a list consists of "numbers", but a mix

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Nick Timkovich
I see the benchmarks, and while I assume the asymptotic complexity is the same, is there a longer "start-up" time your optimizations need? Do you have benchmarks where you sort 10, 100...10**6 items to show that beyond the types you're sorting, you're not amortizing any increased overhead out to

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 8:23 PM David Mertz wrote: > > But also, it's not just the number of list objects that are sorted, but > how often it's done. I could have a 10,000 line program with only one call > to `my_list.sort()` in it... but that one line is something that is

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 8:20 PM David Mertz wrote: > > B. I think a very large percentage of lists are heterogeneous. But most > of the time when they are, it's not because there are several different > numeric types but rather because a list collects some sort of custom >

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 8:10 PM Chris Angelico wrote: > > Also, the performance hit is so small, and even that is in the very > worst case (a homogeneous list with one different type at the end). I > like changes that make stuff run faster. > I agree.

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread David Mertz
In my experience and/or estimation... well two things: A. Mixtures of floats/ints seem a lot more common than the 10% being thrown around. I don't even consider that bad practice currently (it might become bad practice if Elliot's patch is used, since keeping elements float-only would allow much

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread David Mertz
On Sun, Mar 5, 2017 at 6:45 PM, Steven D'Aprano wrote: > Here is a radical thought... why don't lists track their common type > themselves? There's only a few methods which can add items: > I had exactly the same thought. Lists would need to grow a new attribute, of

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 7:50 PM Chris Angelico wrote: > > I would be rather curious to know how frequently a list consists of > "numbers", but a mix of ints and floats. Does it happen a > lot in real-world code? > > This is of course undecidable to verify statically, so we can't

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
This is exactly what I would have done if I didn't think the changes would be too large-scale to ever be accepted (at least if proposed by someone without any experience). This is also what my friend suggested when I explained my project to him. In fact, this is exactly what dictionaries do: at

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Chris Angelico
On Mon, Mar 6, 2017 at 12:53 PM, Elliot Gorokhovsky wrote: > (1) lists are idiomatically type-homogeneous. To quote the Python > documentation, "Lists are mutable, and *their elements are usually > homogeneous* and are accessed by iterating over the list". > (2) it's

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Steven D'Aprano
On Sun, Mar 05, 2017 at 07:19:43AM +, Elliot Gorokhovsky wrote: > You may remember seeing some messages on here about optimizing list.sort() > by exploiting type-homogeneity: since comparing apples and oranges is > uncommon (though possible, i.e. float to int), it pays off to check if the >

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 6:30 PM INADA Naoki wrote: > Good job. I'll read your work later. > Thanks! So please don't use string-key dict specialization to rationalize > list-sort specialization. > I'm not quite doing that: I agree that dict is a special case because of

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread INADA Naoki
Good job. I'll read your work later. > relatively simple optimization. I would also add that Python dictionaries > already implement this optimization: they start out optimizing based on the > assumption that they'll only be seeing string keys, checking to make sure > that assumption holds as

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 10:51 AM David Mertz wrote: > Thanks for the hard work, this looks very promising. > Thank you! > Real world data tends to be mostly-sorted. So it would be useful for your > benchmarks to include: > > A) Performance on completely sorted data > i) Of

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread David Mertz
On Sat, Mar 4, 2017 at 11:19 PM, Elliot Gorokhovsky < elliot.gorokhov...@gmail.com> wrote: > (Summary of results: my patch at https://bugs.python.org/issue28685 makes > list.sort() 30-50% faster in common cases, and at most 1.5% slower in the > uncommon worst case.) > Thanks for the hard work,