Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-09 Thread Tim Peters
[Tim Delaney ]
> Isn't there already always a scan of the iterable to build the keys array
> for sorting (even if no key keyword param is specified)?

No - `.sort()` is a list method, and as such has nothing to do with
arbitrary iterables, just lists (perhaps you're thinking of the
`sorted()` function?).  If no `key=` argument is passed, the list guts
itself is used (as-is) as the vector of keys.


> In which case adding the homogenity check there seems like it shouldn't
> add much overhead at all (I have to say that I was surprised with 10+%
> reductions in speed in some of the heterogenous TimSort tests for this 
> reason).

Those are worst cases where the current sort does very few compares
(like just N-1 for a list of length N).  Because they do "amazingly"
few compares, they're already "amazingly" fast.  And they also do
little data movement (e.g., none at all for /sort & =sort, and N//2
pointer swaps for \sort).  Because of that any new O(N) overhead would
make them significantly slower - unless the new overhead pays off by
allowing a larger time saving than it costs.(as it does when the list
is same-type).

There is a "natural" place to insert "same type?" checks:  the outer
loop of the sort marches over the vector once, left to right,
alternately identifying the next natural run, then possibly extending
it and/or merging it into previous runs.  The checks could be put
there instead, but the code would be ugly and more invasive - I
wouldn't even bother trying it.


> And could specific richcompares be refactored so there was a "we really know
> what the types are is, no need to check" version available to sort() (with
> the typechecking version available for general use/unoptimised sorting)?

They're not _just_ type-aware.  For example, the special function for
ints is specialized to integers that happen to fit in one (internal)
"digit", and the special function for strings is specialized to those
that happen to be stored in PyUnicode_1BYTE_KIND format.  Adding such
stuff to the public C API would be ... well, for a start, tedious ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] dict(default=int)

2017-03-09 Thread Chris Angelico
On Fri, Mar 10, 2017 at 11:29 AM, Erik  wrote:
> On 09/03/17 23:04, Spencer Brown wrote:
>>
>> Might make more sense to be dict.default(int), that way it doesn't
>> have redundant dict names.
>
>
> I thought that, too.
>
>> since you could do {1:2, 3:4}.default(int)
>
>
> Could you?
>
> Python 3.6.0 (default, Mar  9 2017, 00:43:06)
> [GCC 5.4.0 20160609] on linux
> Type "help", "copyright", "credits" or "license" for more information.
 type(dict())
> 
 type({})
> 
 type(dict)
> 
>
> The thing bound to the name 'dict' is not the same as the object returned by
> _calling_ 'dict'.

Yes, you could; it'd be a classmethod, like dict.fromkeys. IMO it
should just ignore any instance argument - same as you see here:

>>> dict.fromkeys(range(3))
{0: None, 1: None, 2: None}
>>> {1:2,3:4}.fromkeys(range(3))
{0: None, 1: None, 2: None}

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-09 Thread Chris Barker
On Thu, Mar 9, 2017 at 2:04 AM, Barry Scott  wrote:

> It was not clear to me that you need to scan the list at the start to make
> sure its homogeneous. Given that the type check is so cheap will it
> slow the sort if you do the pointer check in the compare code? I am not
> suggesting you run rich compare full fat on each compare.
>

I think you have a point here.

IIUC, the current code makes no assumptions about type homogeneity. So it
must do something generic at each compare. Perhaps that generic thing (of
course, Elliot knows that is) does do a pointer compare, and then something
smart and fast for some built-ins. but there is still a few steps:

these are both the same type
what type are they
how do I compare them?
do the compare

These may well all be fast, but it's still a few steps, and have to be done
O(n*log(n)) times

IIUC, Elliot's patch does something like:

First go through the whole list and do:
  What is the type of the first item (only once)
  How do I compare that type (only once)
  Is everything else in the list the same type ( O(n) )

Then the actual sort:
 - do the compare ( O(n*log(n)) )

So there is one operation done n times, and one done n*log(n) times, rather
than 4 done n*log(n) times, if the constants are about the same, that saves
3n*log(n) - n operations, or -- if n is non-trivial, then n*3*log(n)
operations

so we go from 4*n*log(n) to n*log(n) about a 4 times speed up -- in theory.
with all the other complications of computer performance, only profiling
will tell you! (also, it's not 4 times on the whole sort -- just the
compare part -- you still need to shuffle the values around, which
presumably takes at last as long as each of the above operations)

(not sure about all four of those steps, it may be only three -- but still
a 3 time speed up)

Now Barry's idea:

Assume that the list is homogenous, and crap out if it turns out it's not:

Start off:
  What is the type of the first item (only once)
  How do I compare that type (only once)
Do the sort:
  Is the next item the correct type? O(n*log(n))
  Do the compare ( O(n*log(n)) )

So we now have 2 operations to be run O(n*log(n)) -- so not as good at only
one, but still better than the 3 or 4 of the fully general sort. And this
would have less impact on the non-homogenous case.

Though Elliot points out that this would be much harder to branch-predict,
etc... so maybe not.

Maybe worth trying out and profiling??

NOTE: I really have no idea what really goes in in that code -- so I may
have this all wrong...

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] dict(default=int)

2017-03-09 Thread Chris Barker
>If we really want to make defaultdict feel more "builtin" (and I don't see
> >any reason to do so), I'd suggest adding a factory function:
> >
> >dict.defaultdict(int)
>


> Nice.
>

I agree -- what about:

dict.sorteddict() ??

make easy access to various built-in dict variations...

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-09 Thread Tim Delaney
On 9 March 2017 at 21:04, Barry Scott  wrote:

> It was not clear to me that you need to scan the list at the start to make
> sure its homogeneous. Given that the type check is so cheap will it
> slow the sort if you do the pointer check in the compare code? I am not
> suggesting you run rich compare full fat on each compare.
>

Isn't there already always a scan of the iterable to build the keys array
for sorting (even if no key keyword param is specified)? In which case
adding the homogenity check there seems like it shouldn't add much overhead
at all (I have to say that I was surprised with 10+% reductions in speed in
some of the heterogenous TimSort tests for this reason).

And could specific richcompares be refactored so there was a "we really
know what the types are is, no need to check" version available to sort()
(with the typechecking version available for general use/unoptimised
sorting)?

Tim Delaney
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] dict(default=int)

2017-03-09 Thread Barry Warsaw
On Mar 08, 2017, at 05:49 PM, Eric V. Smith wrote:

>If we really want to make defaultdict feel more "builtin" (and I don't see
>any reason to do so), I'd suggest adding a factory function:
>
>dict.defaultdict(int)
>
>Similar in spirit to dict.fromkeys(), except of course returning a
>defauldict, not a dict.

Nice.

-Barry


pgpK0qyKJMncc.pgp
Description: OpenPGP digital signature
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Submitted a PR!

2017-03-09 Thread Elliot Gorokhovsky
Just submitted a PR implementing this:
https://github.com/python/cpython/pull/582
-- just need someone to review it now :)

Thanks for all your feedback, everyone!

On Sun, Mar 5, 2017 at 12:19 AM Elliot Gorokhovsky <
elliot.gorokhov...@gmail.com> wrote:

> (Summary of results: my patch at https://bugs.python.org/issue28685 makes
> list.sort() 30-50% faster in common cases, and at most 1.5% slower in the
> uncommon worst case.)
>
> Hello all,
>
> You may remember seeing some messages on here about optimizing list.sort()
> by exploiting type-homogeneity: since comparing apples and oranges is
> uncommon (though possible, i.e. float to int), it pays off to check if the
> list is type-homogeneous (as well as homogeneous with respect to some other
> properties), and, if it is, to replace calls to PyObject_RichCompareBool
> with calls to ob_type->tp_richcompare (or, in common special cases, to
> optimized compare functions). The entire patch is less than 250 lines of
> code, most of which is pretty boilerplate (i.e. a lot of assertions in
> #ifdef Py_DEBUG blocks, etc).
>
> I originally wrote that patch back in November. I've learned a lot since
> then, both about CPython and about mailing list etiquette :). Previous
> discussion about this can be found at
> https://mail.python.org/pipermail/python-dev/2016-October/146648.html and
> https://mail.python.org/pipermail/python-ideas/2016-October/042807.html.
>
> Anyway, I recently redid the benchmarks much more rigorously (in
> preparation for presenting this project at my school's science fair),
> achieving a standard deviation of less than 0.5% of the mean for all
> measurements. The exact benchmark script used can be found at
> https://github.com/embg/python-fastsort-benchmark (it's just sorting
> random lists of/lists of tuples of [type]. While listsort.txt talks about
> benchmarking different kinds of structured lists, instead of just random
> lists, the results here would hold in those cases just as well, because
> this makes individual comparisons cheaper, instead of reducing the number
> of comparisons based on structure).
>
> I also made a poster describing the optimization and including a pretty
> graph displaying the benchmark data:
> https://github.com/embg/python-fastsort-benchmark/blob/master/poster.pdf.
> For those who would rather read the results here (though it is a *really*
> pretty graph):
>
> ***
> Percent improvement for sorting random lists of [type]
> (1-patched/unpatched):
> float: 48%
> bounded int (magnitude smaller than 2^32): 48.4%
> latin string (all characters in [0,255]): 32.7%
> general int (reasonably uncommon?): 17.2%
> general string (reasonably uncommon?): 9.2%
> tuples of float: 63.2%
> tuples of bounded int: 64.8%
> tuples of latin string: 55.8%
> tuples of general int: 50.3%
> tuples of general string: 44.1%
> tuples of heterogeneous: 41.5%
> heterogeneous (lots of float with an int at the end; worst-case): -1.5%
> ***
>
> Essentially, it's a gamble where the payoff is 20-30 times greater than
> the cost, and the odds of losing are very small. Sorting is perhaps not a
> bottleneck in most applications, but considering how much work has gone
> into Python's sort (Timsort, etc; half of listobject.c is sort code), I
> think it's interesting that list.sort() can be made essentially twice
> faster by a relatively simple optimization. I would also add that Python
> dictionaries already implement this optimization: they start out optimizing
> based on the assumption that they'll only be seeing string keys, checking
> to make sure that assumption holds as they go. If they see a non-string
> key, they permanently switch over to the general implementation. So it's
> really the same idea, except here it doesn't matter as much what type we're
> dealing with, which is important, because lists are commonly used with lots
> of different types, as opposed to dictionaries, which overwhelmingly
> commonly use string keys, especially internally. (Correct me if I'm wrong
> in any of the above).
>
> I got a lot of great feedback/input on this patch as I was writing it, but
> after submitting it, I didn't hear much from anybody. (The reason I took so
> long to post was because I wanted to wait until I had the chance to do the
> benchmarks *right*). What do you all think?
>
> Thanks,
> Elliot
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-09 Thread Elliot Gorokhovsky
On Thu, Mar 9, 2017 at 3:04 AM Barry Scott  wrote:

>
> So you do O(nlogn)*2 pointer compares with my suggestion it seems? Which
> is smaller the O(n) pointer checks?
>

Not sure what you mean here... pretty sure your inequality is backwards?

Look -- the point is, we already *do* the pointer checks during every
compare. That's the current implementation. I believe my benchmarks are
sufficient to prove that my patch is much faster than the current
implementation. Which makes sense, because we do one type check per object,
as opposed to something like log n per object, and we do them all at once,
which I do believe is faster than doing them across calls.

Now, you're not entirely wrong -- the current implementation doesn't do the
type checks as efficiently as it could. Specifically, it does them once in
PyObject_RichCompareBool, and then *again* in ob_type->tp_richcompare
(which it has to for safety: ob_type->tp_richcompare doesn't know the
arguments have already been type-checked!) You could totally define
optimized compares that do the type-checks more efficiently, and you would
see a benefit, just not nearly as extreme as the benefit I demonstrate.

Thanks again for your feedback!
Elliot
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-09 Thread Barry Scott

> On 8 Mar 2017, at 22:58, Elliot Gorokhovsky  
> wrote:
> 
> On Wed, Mar 8, 2017 at 2:14 PM Barry  > wrote:
> Can you assume that list of of type(list[0]) and use that type's optimised 
> sort?
> But in the optimised sort code check that the types are as required.
> If you hit an element that is not of the required type then fall back to the 
> unoptimised sort.
> 
> Well, how would you tell if you've hit an element that is not of the required 
> type? You'd have to check during every compare, right? And that's precisely 
> what we're trying to avoid!

What it seemed the trick for optimisation is is to compare the type pointer of 
an object to see if its the same as a type supported by the chosen optimised 
sort.

It was not clear to me that you need to scan the list at the start to make sure 
its homogeneous. Given that the type check is so cheap will it
slow the sort if you do the pointer check in the compare code? I am not 
suggesting you run rich compare full fat on each compare.

> The whole point of my patch is that we do O(nlogn) compares, but only have 
> O(n) elements, so it's much cheaper to do all the type checks in advance, and 
> in the very likely case that our list is homogeneous, switch to an optimized 
> special-case compare function.

So you do O(nlogn)*2 pointer compares with my suggestion it seems? Which is 
smaller the O(n) pointer checks?

> 
> Even when we only do O(n) compares, my patch is still much faster (see 
> benchmarks higher up in this thread). Why? Well, if you're doing the type 
> checks during the compares, you're doing them across different function 
> calls, with other stuff interspersed in between. So pipeline/branch 
> prediction/caching is less able to optimize away the overhead of the safety 
> checks (I don't know how CPUs work, but one of those is relevant here). With 
> the pre-sort check, it's all in a single iteration through the list, and 
> we're taking the same exact branch every time; it's much faster. Think 
> iterating over a matrix row-wise versus iterating column-wise (not exactly 
> relevant here since that's about cache, but that's the idea. Again, I don't 
> really know how CPUs work).

Provided the code is small I think both versions of the algorithm will benefit 
from cache and branch prediction.

> 
> So in short: no, we can't just check as we go, because that's what we're 
> already doing. But we can optimize significantly by checking in advance. I 
> mean, practically speaking, the advance check is basically free. The 
> compare-time checks, in sum, are not, both asymptotically and practically.

I not clear this is a true. But I have not read the code.

Barry___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/