[Python-ideas] Submitted a PR!

2017-03-09 Thread Elliot Gorokhovsky
Just submitted a PR implementing this:
https://github.com/python/cpython/pull/582
-- just need someone to review it now :)

Thanks for all your feedback, everyone!

On Sun, Mar 5, 2017 at 12:19 AM Elliot Gorokhovsky <
elliot.gorokhov...@gmail.com> wrote:

> (Summary of results: my patch at https://bugs.python.org/issue28685 makes
> list.sort() 30-50% faster in common cases, and at most 1.5% slower in the
> uncommon worst case.)
>
> Hello all,
>
> You may remember seeing some messages on here about optimizing list.sort()
> by exploiting type-homogeneity: since comparing apples and oranges is
> uncommon (though possible, i.e. float to int), it pays off to check if the
> list is type-homogeneous (as well as homogeneous with respect to some other
> properties), and, if it is, to replace calls to PyObject_RichCompareBool
> with calls to ob_type->tp_richcompare (or, in common special cases, to
> optimized compare functions). The entire patch is less than 250 lines of
> code, most of which is pretty boilerplate (i.e. a lot of assertions in
> #ifdef Py_DEBUG blocks, etc).
>
> I originally wrote that patch back in November. I've learned a lot since
> then, both about CPython and about mailing list etiquette :). Previous
> discussion about this can be found at
> https://mail.python.org/pipermail/python-dev/2016-October/146648.html and
> https://mail.python.org/pipermail/python-ideas/2016-October/042807.html.
>
> Anyway, I recently redid the benchmarks much more rigorously (in
> preparation for presenting this project at my school's science fair),
> achieving a standard deviation of less than 0.5% of the mean for all
> measurements. The exact benchmark script used can be found at
> https://github.com/embg/python-fastsort-benchmark (it's just sorting
> random lists of/lists of tuples of [type]. While listsort.txt talks about
> benchmarking different kinds of structured lists, instead of just random
> lists, the results here would hold in those cases just as well, because
> this makes individual comparisons cheaper, instead of reducing the number
> of comparisons based on structure).
>
> I also made a poster describing the optimization and including a pretty
> graph displaying the benchmark data:
> https://github.com/embg/python-fastsort-benchmark/blob/master/poster.pdf.
> For those who would rather read the results here (though it is a *really*
> pretty graph):
>
> ***
> Percent improvement for sorting random lists of [type]
> (1-patched/unpatched):
> float: 48%
> bounded int (magnitude smaller than 2^32): 48.4%
> latin string (all characters in [0,255]): 32.7%
> general int (reasonably uncommon?): 17.2%
> general string (reasonably uncommon?): 9.2%
> tuples of float: 63.2%
> tuples of bounded int: 64.8%
> tuples of latin string: 55.8%
> tuples of general int: 50.3%
> tuples of general string: 44.1%
> tuples of heterogeneous: 41.5%
> heterogeneous (lots of float with an int at the end; worst-case): -1.5%
> ***
>
> Essentially, it's a gamble where the payoff is 20-30 times greater than
> the cost, and the odds of losing are very small. Sorting is perhaps not a
> bottleneck in most applications, but considering how much work has gone
> into Python's sort (Timsort, etc; half of listobject.c is sort code), I
> think it's interesting that list.sort() can be made essentially twice
> faster by a relatively simple optimization. I would also add that Python
> dictionaries already implement this optimization: they start out optimizing
> based on the assumption that they'll only be seeing string keys, checking
> to make sure that assumption holds as they go. If they see a non-string
> key, they permanently switch over to the general implementation. So it's
> really the same idea, except here it doesn't matter as much what type we're
> dealing with, which is important, because lists are commonly used with lots
> of different types, as opposed to dictionaries, which overwhelmingly
> commonly use string keys, especially internally. (Correct me if I'm wrong
> in any of the above).
>
> I got a lot of great feedback/input on this patch as I was writing it, but
> after submitting it, I didn't hear much from anybody. (The reason I took so
> long to post was because I wanted to wait until I had the chance to do the
> benchmarks *right*). What do you all think?
>
> Thanks,
> Elliot
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-09 Thread Elliot Gorokhovsky
On Thu, Mar 9, 2017 at 3:04 AM Barry Scott  wrote:

>
> So you do O(nlogn)*2 pointer compares with my suggestion it seems? Which
> is smaller the O(n) pointer checks?
>

Not sure what you mean here... pretty sure your inequality is backwards?

Look -- the point is, we already *do* the pointer checks during every
compare. That's the current implementation. I believe my benchmarks are
sufficient to prove that my patch is much faster than the current
implementation. Which makes sense, because we do one type check per object,
as opposed to something like log n per object, and we do them all at once,
which I do believe is faster than doing them across calls.

Now, you're not entirely wrong -- the current implementation doesn't do the
type checks as efficiently as it could. Specifically, it does them once in
PyObject_RichCompareBool, and then *again* in ob_type->tp_richcompare
(which it has to for safety: ob_type->tp_richcompare doesn't know the
arguments have already been type-checked!) You could totally define
optimized compares that do the type-checks more efficiently, and you would
see a benefit, just not nearly as extreme as the benefit I demonstrate.

Thanks again for your feedback!
Elliot
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-08 Thread Elliot Gorokhovsky
On Wed, Mar 8, 2017 at 2:14 PM Barry  wrote:

> Can you assume that list of of type(list[0]) and use that type's optimised
> sort?
> But in the optimised sort code check that the types are as required.
> If you hit an element that is not of the required type then fall back to
> the unoptimised sort.
>

Well, how would you tell if you've hit an element that is not of the
required type? You'd have to check during every compare, right? And that's
precisely what we're trying to avoid!

The whole point of my patch is that we do O(nlogn) compares, but only have
O(n) elements, so it's much cheaper to do all the type checks in advance,
and in the very likely case that our list is homogeneous, switch to an
optimized special-case compare function.

Even when we only do O(n) compares, my patch is still much faster (see
benchmarks higher up in this thread). Why? Well, if you're doing the type
checks during the compares, you're doing them across different function
calls, with other stuff interspersed in between. So pipeline/branch
prediction/caching is less able to optimize away the overhead of the safety
checks (I don't know how CPUs work, but one of those is relevant here).
With the pre-sort check, it's all in a single iteration through the list,
and we're taking the same exact branch every time; it's much faster. Think
iterating over a matrix row-wise versus iterating column-wise (not exactly
relevant here since that's about cache, but that's the idea. Again, I don't
really know how CPUs work).

So in short: no, we can't just check as we go, because that's what we're
already doing. But we can optimize significantly by checking in advance. I
mean, practically speaking, the advance check is basically free. The
compare-time checks, in sum, are not, both asymptotically and practically.

Best
Elliot
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-07 Thread Elliot Gorokhovsky
On Tue, Mar 7, 2017 at 1:47 PM Erik  wrote:

>
> I'd prefer the sort optimization to be based on what my list contains
> NOW, not on what it may have contained some time in the past, so I'm not
> a fan of the "once heterogeneous, always considered heterogeneous"
> behaviour if it's cheap enough to avoid it.
>

Sure. Dictionaries actually don't implement this, though: as soon as they
see a non-string key, they permanently switch to a heterogeneous state
(IIRC).

I think the bigger problem, though, is that most list use does *not*
involve sorting, so it would be a shame to impose the non-trivial overhead
of type-checking on *all* list use. With dictionaries, you have to
type-check inserts anyway (for equality testing), so it's less of a
problem.  But the fundamental list operations *don't* require type-checking
currently, so why add it in? In practice, the pre-sort check is *very*
cheap, and its cost is only imposed on sort usage.

Anyway, my patch could always be a precursor to a more general optimization
along these lines. I'm almost finished fixing the problem Tim identified
earlier in this thread; after that, it'll be ready for review!
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 11:31 PM Tim Peters  wrote:

>
> The best approach is indeed to pass the function pointer to every
> location that needs it.  Note that a MergeState struct is already
> created per sort invocation, That isn't a file global for much the
> same reason.
>

Right. It's a real shame, because the patch as it stands right now is
extremely surgical, but there's clearly no way around it. There are some
functions that don't take in MergeState (e.g. gallop_left) but that call
ISLT/IFLT, so I think I'll just add a compare function pointer parameter to
all the function calls. I mean, the diff will be a lot hairier, which might
make the patch more difficult to review, but when it comes down to it the
code complexity won't really increase, so overall I don't think this is the
end of the world.

Thanks so much for pointing this out!

P.S. Is it OK if I close my current issue on the bug tracker and open a new
issue, where I'll post the revised patch? The writing on my current issue
uses the old, less-rigorous benchmarks, and I feel it would be less
confusing if I just made a new issue and posted the correct
benchmarks/discussion at the top. The current issue doesn't have many
comments, so not much would be lost by just linking to it from the new
issue. If this violates bug tracker etiquette, however, :) I'll just post
the revised patch on my current issue.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 11:39 PM INADA Naoki  wrote:

>
> I think there is another safe solution:  Gave up unsafe_object_compare.
>
> Compare function of long, float, and unicode must not call list.sort(),
> and must not release GIL.
> So all you need is, backup old compare_function before sort, and
> restore it after sort.
>

That would definitely work, but I think it's too much of a sacrifice for
these reasons:
(1) You would have to give up on optimizing tuples (because tuple compares
call PyObject_RichCompareBool, which could itself call arbitrary code).
That's a real shame, because sorting tuples is common (e.g., in DEAP, you
sort tuples of floats representing the fitnesses of individuals). The
alternative would be to only optimize tuples of long/string/float. However,
I think the code complexity of *that* would be greater than the, admittedly
messy, solution of passing in the compare everywhere it's needed.
(2) There are lots of types, e.g. bytes, that would fall through the cracks.
(3) Correct me if I'm wrong, but this would make us depend on GIL, right?
And we don't want to depend on that if we don't have to, right?

It seems to me that the only solution here is to go in and add a compare
function pointer to each function call. Extremely unfortunate, but
necessary.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 10:45 PM Elliot Gorokhovsky <
elliot.gorokhov...@gmail.com> wrote:

>
> the problem is, how can we get a unique identifier for the thread in a
> platform-independent way? Any ideas?
>

Oh, I could probably just copy over code from threading.get_ident()... not
sure if the key-value table is a good solution, though.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 10:25 PM David Mertz  wrote:

>
> Could we make the file global a table of comparison functions and have
> each thread reference a position in the table? It would be fine if multiple
> threads happened to use the position of e.g. the int comparison, just so
> long as each chose the right slot in the table.
>

I don't think that would solve anything. The problem is that the pre-sort
check is carried out in the main sort function, but then lots of smaller
functions need to know which compare to use. Since the scope isn't shared,
then, how can we inform the smaller functions of which compare to use?

I solved this problem by just putting the compare function pointer in
global scope. Obviously, however, this is not thread-safe. So the question
is, how can the smaller functions be made aware of the results of the
pre-sort check? Your proposal would merely replace the problem of
communicating the function pointer with the problem of communicating the
appropriate index into the function table. The smaller functions wouldn't
know which index they're supposed to use unless you passed it in.

However, I believe the following *would* work:
Suppose it were possible to access an identifier unique to each thread,
such as that returned by gettid() on Linux. Then maintain a global table of
(unique_thread_id, compare_func) pairs, and simply have the ISLT macro
index into the table! A bit of a performance hit, sure, but hopefully not
that bad? Certainly better than passing it in every time... the problem is,
how can we get a unique identifier for the thread in a platform-independent
way? Any ideas?
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 9:25 PM Tim Peters  wrote:

>
> Would someone please move the patch along?  I expect it's my fault it's
> languished so long, since I'm probably the natural person to review it, but
> I've been buried under other stuff.
>
> But the patch doesn't change anything about the sorting algorithm itself -
> even shallow knowledge of how timsort works is irrelevant.  It's just
> plugging in a different bottom-level object comparison _function_ when that
> appears valuable.
>
> I've said from the start that it's obvious (to me ;-) ) that it's an
> excellent tradeoff.  At worst it adds one simple (pre)pass over the list
> doing C-level pointer equality comparisons.  That's cheap.  The worst-case
> damage is obviously small, the best-case gain is obviously large, and the
> best cases are almost certainly far more common than the worst cases in
> most code.
>

Thank you so much for the support! Yes to all of those things!


>
> In reviewing my own code, after it was fiddled to work under Python 3
> there are no mixed-type lists that are ever sorted.  There are lists with
> complex objects, but whenever those are sorted there's a `key=` argument
> that reduces the problem to sorting ints or tuples of builtin scalar types.
>

I'm adding that quote to the next version my poster :)


>
> One subtle thing to look at:  thread safety.  IIRC, the patch plugged the
> comparison function into a file global.  That's obviously hosed if multiple
> threads sort different kinds of lists simultaneously.
>

Wow, that is a *very* good point. I never think about those kinds of
things, being a n00b, so thanks for catching that... I'll have to go in an
fix that. I'm not sure how, though, because the ISLT macro gets called in a
bunch of different functions. The only way I can think of to fix it would
be to pass the function pointer as an argument to *all* the functions that
use ISLT, which would be pretty messy. What do you think would be the
easiest fix?

Thanks!
Elliot
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 9:12 PM MRAB  wrote:

>
> Although it's true that both programmers and Python might treat 10 as
> functionally identical to 10.0, in practice the numbers that are being
> added to the list probably come from some code that returns integers
> /or/ floats, rather than a mixture.
>

Yes, exactly. So we can see how the homogeneity assumption is a reasonable
one to make; unless you're defining custom compares (uncommon), I don't see
where you would ever be sorting a heterogeneous list except for the
int/float case.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 8:23 PM David Mertz  wrote:

>
> But also, it's not just the number of list objects that are sorted, but
> how often it's done.  I could have a 10,000 line program with only one call
> to `my_list.sort()` in it... but that one line is something that is called
> inside an inner loop.  This feels like it really needs profiling not just
> static analysis.
>

Totally -- but what I mean is if you look at the performance benchmark
suite, for example, most of the benchmarks do not have the string "sort" in
their source code (IIRC). I think that most applications don't spend most
of their time sorting. That's not to say sorting isn't important -- I
wouldn't have written my patch if I didn't think it is. I'm just saying it
probably isn't a good idea to penalize *all* list use just to benefit the
minority of list use that involves sorting.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 8:20 PM David Mertz  wrote:

>
> B. I think a very large percentage of lists are heterogeneous.  But most
> of the time when they are, it's not because there are several different
> numeric types but rather because a list collects some sort of custom
> objects.  Maybe those even all share some ancestor, but the point is they
> are user/library defined classes that won't have a fast comparison like
> ints or floats do.
>

As long as the ob_type for all the objects is the same, my patch will sort
significantly faster, as it replaces PyObject_RichCompareBool with
ob_type->tp_richcompare. It doesn't matter if your list is builtin-types or
not, as long as the types are all the same, you get a speedup (though it's
greater for built-in types).

Also, I actually wrote a crawler to see how many PyPI packages implement a
custom compare function (like you would have to for user-defined types).
The answer is less than 6%. Code:
https://github.com/embg/python-fast-listsort/tree/master/crawler


> B (part 2): I haven't actually read his patch, but I'm assuming that
> Elliot's approach can start scanning the list, and as soon as it find a
> complex/custom object at index 0 ignore the rest of the scan.  So for that
> case, there is very little harm in his linear scan (over one item).
>

Yup, the pre-sort check breaks out of the loop as soon as it sees
heterogeneity. So unless your float is at the end of the whole list of ints
(as in my worst-case benchmark), the cost is basically 0.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 8:10 PM Chris Angelico  wrote:

>
> Also, the performance hit is so small, and even that is in the very
> worst case (a homogeneous list with one different type at the end). I
> like changes that make stuff run faster.
>

I agree.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 7:50 PM Chris Angelico  wrote:

>
> I would be rather curious to know how frequently a list consists of
> "numbers", but a mix of ints and floats. Does it happen a
> lot in real-world code?
>
>
This is of course undecidable to verify statically, so we can't just crawl
PyPI... however, I would argue that using mixed float-int lists is
dangerous, and is more dangerous in Python 3 than in Python 2. So hopefully
this is not very common. However, even if 10% (surely a vast overestimate)
of sort calls are to mixed int-float lists, my patch would still yield a
significant savings on average.

(Apologies if this has already been mentioned; I've been skimming the
> thread, not reading it in detail.)


It hasn't been mentioned in this thread :)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
This is exactly what I would have done if I didn't think the changes would
be too large-scale to ever be accepted (at least if proposed by someone
without any experience). This is also what my friend suggested when I
explained my project to him. In fact, this is exactly what dictionaries do:
at each insert, they check whether a string is being inserted. If yes, then
stay in a "string" state, if no, then switch to your None state.

Again, the only reason I did it differently is because I wanted to be
realistic about what changes I could get accepted. My patch does everything
inside the list sort function, and doesn't modify anything outside of that.
So it's presumably much easier to review. I mean, I submitted my patch in
November, and it still hasn't been accepted :) so presumably something as
large-scale as this would have very little chance.

One more thing: changing mutating methods to check type would make them
non-trivially slower; if you aren't resizing the list, appending should
really just be as simple as storing a pointer and incrementing a counter.
If we have to check type, that turns two simple things into three simple
things, which could have significant performance implications. Most
applications of lists *don't* care about type homogeneity (i.e. iterating
through the list or whatever), so one would have to justify imposing this
cost on all list use just to optimize for the special cases where you *do*
care. I'm not sure how one would go about checking if that trade-off is a
good one or not: what is the distribution of list use across all Python
code? It's not something you can really check statically (it's undecidable
if you're being pedantic). With my patch, it's easy: if you aren't sorting,
you don't have to pay anything. If you are sorting, you either win or lose,
based on whether or not you're sorting type-homogeneous data or not (which
you basically always are).

If anything, I think my patch could be a starting point for later
optimization along these lines, depending on whether the problem raised in
the previous paragraph can be addressed. What do you think?

On Sun, Mar 5, 2017 at 7:47 PM Steven D'Aprano <st...@pearwood.info> wrote:

On Sun, Mar 05, 2017 at 07:19:43AM +, Elliot Gorokhovsky wrote:

> You may remember seeing some messages on here about optimizing list.sort()
> by exploiting type-homogeneity: since comparing apples and oranges is
> uncommon (though possible, i.e. float to int), it pays off to check if the
> list is type-homogeneous

I sometimes need to know if a list is homogenous, but unfortunately
checking large lists for a common type in pure Python is quote slow.

Here is a radical thought... why don't lists track their common type
themselves? There's only a few methods which can add items:

- append
- extend
- insert
- __setitem__


Suppose we gave lists a read-only attrribute, __type_hint__, which
returns None for hetrogeneous lists and the type for homogeneous lists.
Adding
an item to the list does as follows:

- if the list is empty, adding an item sets __type_hint__ to type(item);
- if the list is not empty, adding an item tests whether type(item)
  is identical to (not a subclass) of __type_hint__, and if not, sets
  __type_hint__ to None;
- removing an item doesn't change the __type_hint__ unless the list
  becomes empty, in which case it is reset to None;
- if the internal allocated space of the list shrinks, that triggers
  a recalculation of the __type_hint__ if it is currently None.

(There's no need to recalculate the hint if it is not None.)

Optional: doing a list.sort() could also recalculate the hint.


The effect will be:

- if __type_hint__ is a type object, then you can be sure that
  the list is homogeneous;

- if the __type_hint__ is None, then it might still be
  homogeneous, but it isn't safe to assume so.

Not only could sorting take advantage of the type hint without needing
to do a separate O(N) scan of the list, but so could other code. I know
I would be interested in using this. I have a fair amount of code that
has to track the type of any items seen in a list, and swap to a "type
agnostic but slow" version if the list is not homogeneous. I could
probably replace that with some variation of:

if thelist.__type_hint__ is None:
process_slow(thelist)
else:
process_fast(thelist)


At the very least, I'd be interested in experimenting with this.

Thoughts?



--
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 6:30 PM INADA Naoki  wrote:

> Good job.  I'll read your work later.
>

Thanks!

So please don't use string-key dict specialization to rationalize
> list-sort specialization.
>

I'm not quite doing that: I agree that dict is a special case because of
internal use. I'm just saying that I'm not the first person to try to
exploit type homogeneity in data structures in CPython. However, my
justification is independent of dict specialization. Namely, it's the
following two considerations:

(1) lists are idiomatically type-homogeneous. To quote the Python
documentation, "Lists are mutable, and *their elements are usually
homogeneous* and are accessed by iterating over the list".
(2) it's not very common to have to "compare apples and oranges". While
it's technically possible to define comparison between any two types you
like, and in practice one sometimes compares e.g. ints and floats, in
practice it's pretty safe to assume the lists you're sorting are going to
be type-homogeneous 95% or 99% of the time.

I feel 2 (int, string) or 3 (int, string, tuple) special case may be worth
> enough if code is clean and benefit seems good.
>

I agree -- I only optimized for int, string, float, and tuple. However, my
code optimizes sorting for *any* type-homogeneous list as well, by simply
replacing PyObject_RichCompareBool with ob_type->richcompare, saving the
method lookup time. This is what gives the speedups for non-latin strings
and bigints in the benchmarks I shared in my original post: I don't have a
special case compare function for them, but by setting the compare function
pointer equal to ob_type->richcompare, I can at least save a little bit of
method lookup time.

In terms of clean code, the special-case compare functions include
assertions in #ifdef Py_DEBUG blocks that list all the requirements
necessary to make them safe. The pre-sort check then verifies the
assumptions for whichever optimized compare is going to be used. So all
that's necessary to verify correctness is to convince oneself that (1) the
assertions are sufficient and (2) the pre-sort check will never use a
compare function for which it cannot prove the assertions. These two points
are pretty easy to check:
(1) Just plug in the assertions in the original code, e.g. unicode_compare
in Objects/unicodeobject.c. Then you have a bunch of if(1) and if(0)
blocks, and if you delete the if(0) blocks you'll get exactly what I have
in the patch.
(2) The pre-sort check code is very simple, and well commented.
I would add further that this patch only actually adds one line of code to
listsort: assign_compare_function(lo.keys, saved_ob_size). The rest of the
patch is just function definitions, and replacing PyObject_RichCompareBool
with a function pointer (in the macro for ISLT(X,Y)).

Anyway, thanks in advance for your time!
Elliot
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Elliot Gorokhovsky
On Sun, Mar 5, 2017 at 10:51 AM David Mertz  wrote:

> Thanks for the hard work, this looks very promising.
>

Thank you!


> Real world data tends to be mostly-sorted.  So it would be useful for your
> benchmarks to include:
>
> A) Performance on completely sorted data
>   i) Of homogenous type
>   ii) Of mixed type
> B) Performance on mostly-sorted data
>   i) Of homogenous type
>   ii) Of mixed type
>

You are entirely correct, as the benchmarks below demonstrate. I used the
benchmark lists from Objects/listsort.txt, which are:

  \sort: descending data
  /sort: ascending data
  3sort: ascending, then 3 random exchanges
  +sort: ascending, then 10 random at the end
  %sort: ascending, then randomly replace 1% of elements w/ random
values
  ~sort: many duplicates
  =sort: all equal

My results are below (the script can be found at
https://github.com/embg/python-fastsort-benchmark/blob/master/bench-structured.py
):
Homogeneous ([int]):
\sort: 54.6%
/sort: 56.5%
3sort: 53.5%
+sort: 55.3%
%sort: 52.4%
~sort: 48.0%
=sort: 45.2%

Heterogeneous ([int]*n + [0.0]):
\sort: -17.2%
/sort: -19.8%
3sort: -18.0%
+sort: -18.8%
%sort: -10.0%
~sort: -2.1%
=sort: -21.3%

As you can see, because there's a lot less non-comparison overhead in the
structured lists, the impact of the optimization is much greater, both in
performance gain and in worst-case cost. However, I would argue that these
data do not invalidate the utility of my patch: the probability of
encountering a type-heterogeneous list is certainly less than 5% in
practice. So the expected savings, even for structured lists, is something
like (5%)(-20%) + (95%)(50%) = 46.5%. And, of course, not *all* the lists
one encounters in practice are structured; certainly not *this* structured.
So, overall, I would say the numbers above are extremely encouraging.
Thanks for pointing out the need for this benchmark, though!

Elliot
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Py_SIZE of PyLongs

2016-10-19 Thread Elliot Gorokhovsky
It's in the code. See longobject.c:

Py_SIZE(v) = ndigits*sign;

You can also see Py_SIZE(v) used on PyLongs all over the place in
longobject.c, for example:

v = (PyLongObject *)vv;
i = Py_SIZE(v);

Just do a ctrl-f for Py_SIZE(v) in longobject.c. Like I said, by looking in
the implementation I was able to figure out that Py_SIZE is interpreted as
the sign times the number of digits (unless I'm missing something), but
this should be in the docs IMO.


On Wed, Oct 19, 2016 at 7:24 PM Thomas Nyberg <tomuxi...@gmx.com> wrote:

> On 10/19/2016 09:04 PM, Elliot Gorokhovsky wrote:
> > A quick note:
> >
> > I'm working on a special-case compare function for bounded integers for
> > the sort stuff. By looking at the implementation, I figured out that
> > Py_SIZE of a long is the sign times the number of digits (...right?).
> > Before looking at the implementation, though, I had looked for this info
> > in the docs, and I couldn't find it anywhere. Since Py_SIZE is public, I
> > think the documentation should make clear what it returns for PyLongs,
> > for example somewhere on the "Integer Objects" page. Apologies if this
> > is specified somewhere else in the docs and I just couldn't find it.
> >
> > Elliot
>
> I don't think this is right.
>
>
> https://github.com/python/cpython/blob/master/Include/object.h#L119
> https://docs.python.org/3/c-api/structures.html#c.Py_SIZE
> https://docs.python.org/3/c-api/structures.html#c.PyVarObject
>
> It returns the `ob_size` fields of a PyVarObject. I think this has to do
> with objects with variable sizes like lists. PyLongs are not
> PyVarObjects because they have no notion of length.
>
> Why would a long be stored as a sequence of digits instead of a (say) 64
> bit integer as 8 bytes?
>
> Cheers,
> Thomas
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Py_SIZE of PyLongs

2016-10-19 Thread Elliot Gorokhovsky
A quick note:

I'm working on a special-case compare function for bounded integers for the
sort stuff. By looking at the implementation, I figured out that Py_SIZE of
a long is the sign times the number of digits (...right?). Before looking
at the implementation, though, I had looked for this info in the docs, and
I couldn't find it anywhere. Since Py_SIZE is public, I think the
documentation should make clear what it returns for PyLongs, for example
somewhere on the "Integer Objects" page. Apologies if this is specified
somewhere else in the docs and I just couldn't find it.

Elliot
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!

2016-10-12 Thread Elliot Gorokhovsky
On Wed, Oct 12, 2016 at 4:26 PM Terry Reedy  wrote:

> I suspect that optimizing string sorting
> will take some experimentation.  If the initial item is str, it might be
> worthwhile to record the highest 'kind' during the type scan, so that
> strncmp can be used if all are ascii or latin-1.
>

My thoughts exactly.

One other optimization along these lines: the reason ints don't give quite
as shocking results as floats is that comparisons are a bit more expensive:
one first has to check that the int would fit in a c long before comparing;
if not, then a custom procedure has to be used. However, in practice ints
being sorted are almost always smaller in absolute value than 2**32 or
whatever. So I think, just as it might pay off to check for latin-1 and use
strcmp, it may also pay off to check for fits-in-a-C-long and use a custom
function for that case as well, since the performance would be precisely as
awesome as the float performance that started this thread: comparisons
would just be the cost of pointer dereference plus the cost of C long
comparison, i.e. the minimum possible cost.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!

2016-10-12 Thread Elliot Gorokhovsky
Ah. That makes a lot of sense, actually. Anyway, so then Latin1 strings are
memcmp-able, and others are not. That's fine; I'll just add a check for
that (I think there are already helper functions for this) and then have
two special-case string functions. Thanks!

On Wed, Oct 12, 2016 at 4:08 PM Alexander Belopolsky <
alexander.belopol...@gmail.com> wrote:

>
> On Wed, Oct 12, 2016 at 5:57 PM, Elliot Gorokhovsky <
> elliot.gorokhov...@gmail.com> wrote:
>
> On Wed, Oct 12, 2016 at 3:51 PM Nathaniel Smith <n...@pobox.com> wrote:
>
> But this isn't relevant to Python's str, because Python's str never uses
> UTF-8.
>
>
> Really? I thought in python 3, strings are all unicode... so what encoding
> do they use, then?
>
>
> No encoding is used.  The actual code points are stored as integers of the
> same size.  If all code points are less than 256, they are stored as 8-bit
> integers (bytes).  If some code points are greater or equal to 256 but less
> than 65536, they are stored as 16-bit integers and so on.
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!

2016-10-12 Thread Elliot Gorokhovsky
On Wed, Oct 12, 2016 at 3:51 PM Nathaniel Smith  wrote:

> But this isn't relevant to Python's str, because Python's str never uses
> UTF-8.
>

Really? I thought in python 3, strings are all unicode... so what encoding
do they use, then?
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!

2016-10-12 Thread Elliot Gorokhovsky
> So what's Python doing? Is it a codepoint ordering?
>

...ya...how is the python interpreter supposed to know what language
strings are in? There is a unique ordering of unicode strings defined by
the unicode standard, AFAIK.
If you want to sort by natural language ordering, see here:
https://pypi.python.org/pypi/natsort
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!

2016-10-12 Thread Elliot Gorokhovsky
On Wed, Oct 12, 2016 at 3:39 PM Nathaniel Smith  wrote:

> It looks like PyUnicode_Compare already has a special case to use
> memcmp when both of the strings fit into latin1:
>

Wow! That's great! I didn't even try reading through unicode_compare,
because I felt I might miss some subtle detail that would break everything.
But ya, that's great! Since surely latin1 is the most common use case. So
I'll just add a latin1 check in the check-loop, and then I'll have two
unsafe_unicode_compare functions. I felt bad about not being able to get
the same kind of string performance I had gotten with python2, so this is
nice.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!

2016-10-12 Thread Elliot Gorokhovsky
On Tue, Oct 11, 2016 at 9:56 PM Nick Coghlan  wrote:

> Once you get to the point of being able to do performance mentions on
> a CPython build with a modified list.sort() implementation, you'll
> want to take a look at the modern benchmark suite in
> https://github.com/python/performance
>

Yup, that's the plan. I'm going to implement optimized compares for tuples,
then implement this as a CPython build, and then run benchmark suites and
write some rigorous benchmarks using perf/timeit.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!

2016-10-12 Thread Elliot Gorokhovsky
On Wed, Oct 12, 2016 at 9:20 AM Tim Peters  wrote:

> > What I'm *not* quite clear on is why Python 3's change to reject
> > comparisons between unrelated types makes this optimisation possible.
>
> It doesn't.  It would also apply in Python 2.  I simply expect the
> optimization will pay off more frequently in Python 3 code.  For
> example, in Python 2 I used to create lists with objects of wildly
> mixed types, and sort them merely to bring objects of the same type
> next to each other.  Things "like that" don't work at all in Python 3.
>
>
> > Surely you have to check either way? It's not that it's a particularly
> > important question - if the optimisation works, it's not that big a
> > deal what triggered the insight. It's just that I'm not sure if
> > there's some other point that I've not properly understood.
>

Yup. Actually, the initial version of this work was with Python 2. What
happened was this: I had posted earlier something along the lines of "hey
everybody let's radix sort strings instead of merge sort because that will
be more fun ok". And everyone wrote me back "no please don't are you
kidding". Tim Peters wrote back "try it but just fyi it's not gonna work".
So I set off to try it. I had never used the C API before, but luckily I
found some Python 2 documentation that gives an example of subclassing
list, so I was able to mostly just copy-paste to get a working list
extension module. I then copied over the implementation of listsort.

My first question was how expensive python compares are vs C compares. And
since python 2 has PyString_AS_STRING, which just gives you a char* pointer
to a C string, I went in and replaced PyObject_RichCompareBool with strcmp
and did a simple benchmark. And I was just totally blown away; it turns out
you get something like a 40-50% improvement (at least on my simple
benchmark).

So that was the motivation for all this. Actually, if I wrote this for
python 2, I might be able to get even better numbers (at least for
strings), since we can't use strcmp in python 3. (Actually, I've heard
UTF-8 strings are strcmp-able, so maybe if we go through and verify all the
strings are UTF-8 we can strcmp them? I don't know enough about how
PyUnicode stuff works to do this safely). My string special case currently
just bypasses the typechecks and goes to unicode_compare(), which is still
wayyy overkill for the common case of ASCII or Latin-1 strings, since it
uses a for loop to go through and check characters, and strcmp uses
compiler magic to do it in like, negative time or something. I even
PyUnicode_READY the strings before comparing; I'm not sure if that's really
necessary, but that's how PyUnicode_Compare does it.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!

2016-10-11 Thread Elliot Gorokhovsky
So I got excited here. And the reason why is that I got those numbers *on
Tim's benchmark*. When I got these kinds of numbers on my benchmarks, I
figured there was probably a problem with they way I was timing, and
certainly the gains couldn't be as extreme as they suggested. But this is
on a benchmark that's already in the codebase!


Here is a detailed explanation of how to reproduce my results, and the
circumstances that would lead them to be invalid:

**

To reproduce, just activate a virtualenv, and then clone
https://github.com/embg/python-fast-listsort.git. Then python setup.py
install and python sortperf.py.


Now let's look at what sortperf.py does and how it relates to Tim's
benchmark at Lib/test/sortperf.py. If you diff the two, you'll find I made
three changes:


1. I added an import, "import fastlist". This obviously would not make
sorting twice faster.


2. I changed the way it formats the output: I changed "fmt = ("%2s %7s" + "
%7s"*len(cases))" to "fmt = ("%2s %7s" + " %6s"*len(cases))". Again
irrelevant.


3. I changed the timing function

#from this


def doit_fast(L):
t0 = time.perf_counter()
L.fastsort()
t1 = time.perf_counter()
print("%6.2f" % (t1-t0), end=' ')
flush()



#to this


def doit(L):
F = FastList(L)
f0 = time.perf_counter()
F.fastsort()
f1 = time.perf_counter()
F = FastList(L)
t0 = time.perf_counter()
F.sort()
t1 = time.perf_counter()
print("%6.2f%%" % (100*(1-(f1-f0)/(t1-t0))), end=' ')
flush()


***

So what we've shown is that (1) if you trust the existing sorting benchmark
and (2) if my modification to doit() doesn't mess anything up (I leave this
up to you to judge), then the measurements are as valid. Which is a pretty
big deal (50% !!!), hence my overexcitement.




Now I'd like to respond to responses (the one I'm thinking of was off-list
so I don't want to quote it) I've gotten questioning how it could be
possible for such a small optimization, bypassing the typechecks, to
possibly have such a large effect, even in theory. Here's my answer:

Let's ignore branch prediction and cache for now and just look at a high
level. The cost of sorting is related to the cost of a single comparison,
because the vast majority of our time (let's say certainly at least 90%,
depending on the list) is spent in comparisons. So let's look at the cost
of a comparison.

Without my optimization, comparisons for floats (that's what this benchmark
looks at) go roughly like this:

1. Test type of left and right for PyObject_RichCompare (which costs two
pointer dereferences) and compare them. "3 ops" (quotes because counting
ops like this is pretty hand-wavy). "2 memory accesses".

2. Get the address of the float compare method from
PyFloat_Type->tp_richcompare. "1 op". "1 memory access".

3. Call the function whose address we just got. "1 op". "Basically 0 memory
accesses because we count the stack stuff in that 1 op".

4. Test type of left and right again in PyFloat_RichCompare and compare
both of them to PyFloat_Type. "4 ops". "2 memory accesses".

5. Get floats from the PyObject* by calling PyFloat_AS_DOUBLE or whatever.
"2 ops". "2 memory accesses".

6. Compare the floats and return. "2 ops".

Now let's tally the "cost" (sorry for use of quotes here, just trying to
emphasize this is an intuitive, theoretical explanation for the numbers
which doesn't take into account the hardware):
"13 ops, 7 memory accesses".

Here's what it looks like in my code:

1. Call PyFloat_AS_DOUBLE on left and right. "2 ops". "2 memory acceses".

2. Compare the floats and return. "2 ops".

Tally: "4 ops, 2 memory accesses".

Now you can argue branch prediction alleviates a lot of this cost, since
we're taking the same branches every time. But note that, branch prediction
or not, we still have to do all of those memory acceses, and since they're
pointers to places all over memory, they miss the cache basically every
time (correct me if I'm wrong). So memory-wise, we really are doing
something like a 7:2 ratio, and op-wise, perhaps not as bad because of
branch prediction, but still, 13:4 is probably bad no matter what's going
on in the hardware.

Now consider that something like 90% of our time is spent in those steps.
Are my numbers really that unbelievable?

Thanks for everything, looking forward to writing this up as a nice latex
doc with graphs and perf benchmarks and all the other rigorous goodies, as
well as a special case cmp func for homogeneous tuples and a simple patch
file,

Elliot
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] INSANE FLOAT PERFORMANCE!!!

2016-10-11 Thread Elliot Gorokhovsky
Warning: the contents of this message may be dangerous for readers with
heart conditions.

So today I looked at PyFloat_RichCompare. I had been scared initially
because it was so complicated, I was worried I would miss some important
error check or something if I special cased it. So I took the following
approach: I replaced all statements of the form PyLong_Check(w) with 0, and
all statements of the form PyFloat_Check(w) with 1. Because that's the
whole point; we're cutting out type checks. I then cut out all blocks
preceded by if (0). So it's definitely safe.

And it turns out that if you do that, PyFloat_RichCompare becomes ONE LINE
(after we set op=Py_LT)!!! Just

return (PyFloat_AS_DOUBLE(v) < PyFloat_AS_DOUBLE(w)) == 0 ? Py_False :
Py_True;

So I thought, wow, this will give some nice numbers! But I underestimated
the power of this optimization. You have no idea. It's crazy.

Since we're dealing with floats, I used Tim Peter's benchmark:
Lib/test/sortperf.py, just modifying one function:

#Old function Tim Peters wrote:
def doit_fast(L):
t0 = time.perf_counter()
L.fastsort()
t1 = time.perf_counter()
print("%6.2f" % (t1-t0), end=' ')
flush()

#My function:
def doit(L):
F = FastList(L)
f0 = time.perf_counter()
F.fastsort()
f1 = time.perf_counter()
F = FastList(L)
t0 = time.perf_counter()
F.sort()
t1 = time.perf_counter()
print("%6.2f%%" % (100*(1-(f1-f0)/(t1-t0))), end=' ')
flush()

So the benchmarking here is valid. I didn't write it. All I did was modify
it to print percent improvement instead of sort time. The numbers below are
percent improvement of my sort vs default sort (just clone my repo and run
python sortperf.py to verify):

 i2**i  *sort  \sort  /sort  3sort  +sort  %sort  ~sort  =sort  !sort
15   32768  44.11%  54.69%  47.41%  57.61%  50.17%  75.24%  68.03%  65.16%
82.16%
16   65536  14.14%  75.38%  63.44%  56.56%  67.99%  66.19%  50.72%  61.55%
61.87%
17  131072  73.54%  60.52%  60.97%  61.63%  52.55%  49.84%  68.68%  84.12%
65.90%
18  262144  54.19%  55.34%  54.67%  54.13%  55.62%  52.88%  69.30%  74.86%
72.66%
19  524288  55.12%  53.32%  53.77%  54.27%  52.97%  53.53%  67.55%  76.60%
78.56%
20 1048576  55.05%  51.09%  60.05%  50.69%  62.98%  50.20%  66.24%  71.47%
61.40%

This is just insane. This is crazy. I didn't write this benchmark, OK, this
is a valid benchmark. Tim Peters wrote it. And except for one trial, they
all show more than 50% improvement. Some in the 70s and 80s. This is crazy.
This is so cool. I just wanted to share this with you guys. I'll submit a
patch to bugs.python.org soon; I just have to write a special case
comparison for tuples and then I'm basically done. This is so cool!
50% man! Crazy!

Elliot

On Mon, Oct 10, 2016 at 9:02 PM Tim Peters  wrote:

> [please restrict follow-ups to python-ideas]
>
> Let's not get hung up on meta-discussion here - I always thought "massive
> clusterf**k" was a precise technical term anyway ;-)
>
> While timing certainly needs to be done more carefully, it's obvious to me
> that this approach _should_ pay off significantly when it applies.
> Comparisons are extraordinarily expensive in Python, precisely because of
> the maze of test-and-branch code it requires just to figure out which
> bottom-level comparison function to invoke each time.  That's why I spent
> months of my life (overall) devising a sequence of sorting algorithms for
> Python that reduced the number of comparisons needed.
>
> Note that when Python's current sort was adopted in Java, they still kept
> a quicksort variant for "unboxed" builtin types.  The adaptive merge sort
> incurs many overheads that often cost more than they save unless
> comparisons are in fact very expensive compared to the cost of pointer
> copying (and in Java comparison of unboxed types is cheap).  Indeed, for
> native numeric types, where comparison is dirt cheap, quicksort generally
> runs faster than mergesort despite that the former does _more_ comparisons
> (because mergesort does so much more pointer-copying).
>
> I had considered something "like this" for Python 2, but didn't pursue it
> because comparison was defined between virtually any two types (34 < [1],
> etc), and people were careless about that (both by design and by
> accident).  In Python 3, comparison "blows up" for absurdly mixed types, so
> specializing for homogeneously-typed lists is a more promising idea on the
> face of it.
>
> The comparisons needed to determine _whether_ a list's objects have a
> common type is just len(list)-1 C-level pointer comparisons, and so goes
> fast.  So I expect that, when it applies, this would speed even sorting an
> already-ordered list with at least 2 elements.
>
> For a mixed-type list with at least 2 elements, it will always be pure
> loss.  But (a) I expect such lists are uncommon (and especially uncommon in
> Python 3); and (b) a one-time scan doing C-level pointer comparisons 

Re: [Python-ideas] Optimizing list.sort() by checking type in advance

2016-10-10 Thread Elliot Gorokhovsky
Oh no, the idea here is just you would copy over the floats associated with
the PyObject* and keep them in an array of such structs, so that we know
which PyObject* are associated with which floats. Then after the standard
library quicksort sorts them you would copy the PyObject* into the list. So
you sort the PyObject* keyed by the floats. Anyway, I think the copying
back and forth would probably be too expensive, it's just an idea. Also, I
apologize for the formatting of my last email, I didn't realize Inbox would
mess up the quoting like that. I'll ensure I use plain-text quotes from now
on.

On Mon, Oct 10, 2016 at 9:38 PM Chris Angelico <ros...@gmail.com> wrote:

> On Tue, Oct 11, 2016 at 2:29 PM, Elliot Gorokhovsky
> <elliot.gorokhov...@gmail.com> wrote:
> > Ya, I think this may be a good approach for floats: if the list is all
> > floats, just copy all the floats into a seperate array, use the standard
> > library quicksort, and then construct a sorted PyObject* array. Like
> maybe
> > set up a struct { PyObject* payload, float key } type of deal.
>
> Not quite sure what you mean here. What is payload, what is key? Are
> you implying that the original float objects could be destroyed and
> replaced with others of equal value? Python (unlike insurance claims)
> guarantees that you get back the exact same object as you started
> with.
>
> ChrisA
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Optimizing list.sort() by checking type in advance

2016-10-10 Thread Elliot Gorokhovsky
Thanks for looking at this!

That's why I spent months of my life (overall) devising a sequence of
sorting algorithms for Python that reduced the number of comparisons needed.


Yes, that's why I think this is so cool: for a couple dozen lines of code,
we can get (at least for some cases, according to my questionable
benchmarks) the kinds of massive improvements you had to use actual
computer science to achieve (as opposed to mere hackery).


Note that when Python's current sort was adopted in Java, they still kept a
quicksort variant for "unboxed" builtin types.  The adaptive merge sort
incurs many overheads that often cost more than they save unless
comparisons are in fact very expensive compared to the cost of pointer
copying (and in Java comparison of unboxed types is cheap).  Indeed, for
native numeric types, where comparison is dirt cheap, quicksort generally
runs faster than mergesort despite that the former does _more_ comparisons
(because mergesort does so much more pointer-copying).


Ya, I think this may be a good approach for floats: if the list is all
floats, just copy all the floats into a seperate array, use the standard
library quicksort, and then construct a sorted PyObject* array. Like maybe
set up a struct { PyObject* payload, float key } type of deal. This
wouldn't work for strings (unicode is scary), and probably not for ints
(one would have to check that all the ints are within C long bounds).
Though on the other hand perhaps this would be too expensive?

I had considered something "like this" for Python 2, but didn't pursue it
because comparison was defined between virtually any two types (34 < [1],
etc), and people were careless about that (both by design and by
accident).  In Python 3, comparison "blows up" for absurdly mixed types, so
specializing for homogeneously-typed lists is a more promising idea on the
face of it.

The comparisons needed to determine _whether_ a list's objects have a
common type is just len(list)-1 C-level pointer comparisons, and so goes
fast.  So I expect that, when it applies, this would speed even sorting an
already-ordered list with at least 2 elements.

That's what my crude benchmarks indicate... when I applied my sort to a
list of 1e7 ints with a float tacked on the end, my sort actually ended up
being a bit faster over several trials (which I attribute to
PyObject_RichCompare == Py_True being faster than PyObject_RichCompareBool
== 1, apologies for any typos in that code).


For a mixed-type list with at least 2 elements, it will always be pure
loss.  But (a) I expect such lists are uncommon (and especially uncommon in
Python 3); and (b) a one-time scan doing C-level pointer comparisons until
finding a mismatched type is bound to be a relatively tiny cost compared to
the expense of all the "rich comparisons" that follow.

So +1 from me on pursuing this.

Elliot, please:

- Keep this on python-ideas.  python-dev is for current issues in Python
development, not for speculating about changes.

- Open an issue on the tracker:  https://bugs.python.org/


OK


- At least browse the info for developers:
https://docs.python.org/devguide/


Ya, I'm working on setting this up as a patch in the hg repo as opposed to
an extension module to make benchmarking cleaner/more sane.


- Don't overlook Lib/test/sortperf.py.  As is, it should be a good test of
what your approach so far _doesn't_ help, since it sorts only lists of
floats (& I don't think you're special-casing them).  If the timing results
it reports aren't significantly hurt (and I expect they won't be), then add
specialization for floats too and gloat about the speedup :-)

Ya, I mean they aren't special-cased, but homogenous lists of floats still
fit in the tp->rich_compare case, which still bypasses the expensive
PyObject_RichCompare. I'll guess I'll see when I implement this as a patch
and can run it on sortperf.py.


- I expect tuples will also be worth specializing (complex sort keys are
often implemented as tuples).


I'm not sure what you mean here... I'm looking at the types of lo.keys, not
of saved_ob_item (I think I said that earlier in this thread by mistake
actually). So if someone is passing tuples and using itemgetter to extract
ints or strings or whatever, the current code will work fine; lo.keys will
be scalar types. Unless I misunderstand you here. I mean, when would
lo.keys actually be tuples?


Nice start! :-)

Thanks!
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/