Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

Chris Barker Thu, 09 Mar 2017 15:21:12 -0800

On Thu, Mar 9, 2017 at 2:04 AM, Barry Scott <[email protected]> wrote:


> It was not clear to me that you need to scan the list at the start to make
> sure its homogeneous. Given that the type check is so cheap will it
> slow the sort if you do the pointer check in the compare code? I am not
> suggesting you run rich compare full fat on each compare.
>

I think you have a point here.

IIUC, the current code makes no assumptions about type homogeneity. So it
must do something generic at each compare. Perhaps that generic thing (of
course, Elliot knows that is) does do a pointer compare, and then something
smart and fast for some built-ins. but there is still a few steps:

these are both the same type
what type are they
how do I compare them?
do the compare

These may well all be fast, but it's still a few steps, and have to be done
O(n*log(n)) times

IIUC, Elliot's patch does something like:

First go through the whole list and do:
  What is the type of the first item (only once)
  How do I compare that type (only once)
  Is everything else in the list the same type ( O(n) )

Then the actual sort:
 - do the compare ( O(n*log(n)) )

So there is one operation done n times, and one done n*log(n) times, rather
than 4 done n*log(n) times, if the constants are about the same, that saves
3n*log(n) - n operations, or -- if n is non-trivial, then n*3*log(n)
operations

so we go from 4*n*log(n) to n*log(n) about a 4 times speed up -- in theory.
with all the other complications of computer performance, only profiling
will tell you! (also, it's not 4 times on the whole sort -- just the
compare part -- you still need to shuffle the values around, which
presumably takes at last as long as each of the above operations)

(not sure about all four of those steps, it may be only three -- but still
a 3 time speed up)

Now Barry's idea:

Assume that the list is homogenous, and crap out if it turns out it's not:

Start off:
  What is the type of the first item (only once)
  How do I compare that type (only once)
Do the sort:
  Is the next item the correct type? O(n*log(n))
  Do the compare ( O(n*log(n)) )

So we now have 2 operations to be run O(n*log(n)) -- so not as good at only
one, but still better than the 3 or 4 of the fully general sort. And this
would have less impact on the non-homogenous case.

Though Elliot points out that this would be much harder to branch-predict,
etc... so maybe not.

Maybe worth trying out and profiling??

NOTE: I really have no idea what really goes in in that code -- so I may
have this all wrong...

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[email protected]

_______________________________________________
Python-ideas mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

Reply via email to