Re: [Python-Dev] [Python-checkins] cpython: Fixes test_getargs2 to get the buildbots working again.

2016-09-11 Thread Steve Dower

On 11Sep2016 1959, Martin Panter wrote:

On 12 September 2016 at 02:48, Steve Dower  wrote:

  Fixes test_getargs2 to get the buildbots working again.


I'm not sure this is the fix we want to keep here, but it was sufficient to
get the test going and unblock all the buildbots.

I'm not entirely sure when the break appeared (essentially we seem to not be
copying *args into a new tuple), but I'd guess it's to do with the fast
calling improvements.


That seems to be everyone else’s guess too. See
https://bugs.python.org/issue28086 (bug about this failure)
https://bugs.python.org/issue27213 (bisected cause)



Huh, I searched and didn't find anything. Maybe I typo'd my search query?

Looking at the bisected cause, it seems like the intent is to allow 
subclasses of tuple to pass through. Considering this is seriously going 
to hold up beta 1, I'd rather assume that's the intent and unblock the 
release.


Cheers,
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: Fixes test_getargs2 to get the buildbots working again.

2016-09-11 Thread Martin Panter
On 12 September 2016 at 02:48, Steve Dower  wrote:
>>   Fixes test_getargs2 to get the buildbots working again.
>
> I'm not sure this is the fix we want to keep here, but it was sufficient to
> get the test going and unblock all the buildbots.
>
> I'm not entirely sure when the break appeared (essentially we seem to not be
> copying *args into a new tuple), but I'd guess it's to do with the fast
> calling improvements.

That seems to be everyone else’s guess too. See
https://bugs.python.org/issue28086 (bug about this failure)
https://bugs.python.org/issue27213 (bisected cause)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP520 and absence of __definition_order__

2016-09-11 Thread Nick Coghlan
On 11 September 2016 at 13:05, Nick Coghlan  wrote:
> VOC & Batavia *should* be OK (worst case, they return
> collections.OrderedDict from __prepare__ and also use it for __dict__
> attributes), but I'm less certain about MicroPython (since I don't
> know enough about how its current dict implementation works to know
> whether or not they'll be able to make the same change PyPy and
> CPython did)

Micropython' s Damien George got back to me and indicated that once
they get around to working on Python 3.6 compatibility (they're
currently still working on 3.5), they'd likely also need to go down
the path of using collections.OrderedDict in the situations where the
3.6 language spec calls for it (MicroPython's default dict
implementation is less sparse than the CPython one, trading greater
memory usage efficiency for an increased risk of hash collisions, so
it's unlikely the new implementation would count as "compact" from
that perspective).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: Fixes test_getargs2 to get the buildbots working again.

2016-09-11 Thread Steve Dower

On 11Sep2016 1944, steve.dower wrote:

https://hg.python.org/cpython/rev/7793d34609cb
changeset:   103679:7793d34609cb
user:Steve Dower 
date:Sun Sep 11 19:43:51 2016 -0700
summary:
  Fixes test_getargs2 to get the buildbots working again.

files:
  Lib/test/test_getargs2.py |  2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)


diff --git a/Lib/test/test_getargs2.py b/Lib/test/test_getargs2.py
--- a/Lib/test/test_getargs2.py
+++ b/Lib/test/test_getargs2.py
@@ -471,7 +471,7 @@

 ret = get_args(*TupleSubclass([1, 2]))
 self.assertEqual(ret, (1, 2))
-self.assertIs(type(ret), tuple)
+self.assertIsInstance(ret, tuple)

 ret = get_args()
 self.assertIn(ret, ((), None))


I'm not sure this is the fix we want to keep here, but it was sufficient 
to get the test going and unblock all the buildbots.


I'm not entirely sure when the break appeared (essentially we seem to 
not be copying *args into a new tuple), but I'd guess it's to do with 
the fast calling improvements.


Cheers,
Steve


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Drastically improving list.sort() for lists of strings/ints

2016-09-11 Thread Elliot Gorokhovsky
Wow, Tim himself! Regarding performance on semi-ordered data: we'll have to
benchmark to see, but intuitively I imagine radix would meet Timsort
because verifying that a list of strings is sorted takes Omega(nw) (which
gives a lower bound on Timsort), where w is the word length. Radix sort is
Theta(nw). So at least asymptotically it checks out. I think if one uses
the two-array algorithm, other semi-sortings can also be exploited, since
the items get placed into their respective buckets in the order in which
they appear in the list. So, for the example you gave, one pass would sort
it correctly (since the list has the property if x_1 and x_2 are in bucket
b, x1 comes before x2 in the list, so x1 will also come before x2 in the
bucket. Except possibly for one "border bucket" that includes n/2). And
then it would just be Theta(nw/b) in each bucket to verify sorted. I mean
honestly the cool thing about radix is that the best case for Timsort on
strings, Omega(nw), is the worst case for radix! So the point is I think
the two array version, at least, preserves a lot of structure. Anyway, I
hope to have benchmarks (relatively) soon! (I'm a senior in high school so
I'm pretty busy...but I'll try to get on this as soon as I can).


On Sun, Sep 11, 2016 at 3:42 PM Tim Peters  wrote:

> [redirected from python-dev, to python-ideas;
>  please send followups only to python-ideas]
>
> [Elliot Gorokhovsky ]
> > ...
> > TL;DR: Should I spend time making list.sort() detect if it is sorting
> > lexicographically and, if so, use a much faster algorithm?
>
> It will be fun to find out ;-)
>
> As Mark, and especially Terry, pointed out, a major feature of the
> current sort is that it can exploit many kinds of pre-existing order.
> As the paper you referenced says, "Realistic sorting problems are
> usually far from random."  But, although they did run some tests
> against data with significant order, they didn't test against any
> algorithms _aiming_ at exploiting uniformity.  Just against their
> radix sort variants, and against a quicksort.
>
> That's where it's likely next to impossible to guess in advance
> whether radix sort _will_ have a real advantage.  All the kinds of
> order the current sort can exploit are far from obvious, because the
> mechanisms it employs are low-level & very general.  For example,
> consider arrays created by this function, for even `n`:
>
> def bn(n):
> r = [None] * n
> r[0::2] = range(0, n//2)
> r[1::2] = range(n//2, n)
> return r
>
> Then, e.g.,
>
> >>> bn(16)
> [0, 8, 1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 6, 14, 7, 15]
>
> This effectively takes range(n), cuts it in half, and does a "perfect
> shuffle" on the two halves.
>
> You'll find nothing in the current code looking for this as a special
> case, but it nevertheless sorts such arrays in "close to" O(n) time,
> and despite that there's no natural run in the input longer than 2
> elements.
>
> That said, I'd encourage you to write your code as a new list method
> at first, to make it easiest to run benchmarks.  If that proves
> promising, then you can worry about how to make a single method
> auto-decide which algorithm to use.
>
> Also use the two-array version.  It's easier to understand and to
> code, and stability is crucial now.  The extra memory burden doesn't
> bother me - an array of C pointers consumes little memory compared to
> the memory consumed by the Python objects they point at.
>
> Most of all, have fun! :-)
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Drastically improving list.sort() for lists of strings/ints

2016-09-11 Thread Petr Viktorin

On 09/11/2016 10:48 PM, Terry Reedy wrote:
[...]

Second, with respect to timsort in particular: timsort is designed to
exploit structure and run faster than O(n*logn) in special cases.  If a
list is already sorted, timsort will do one O(n) scan and stop.  Any
radix sort will take several times longer.  If a list is reverse sorted,
timsort will do one O(n) scan and do an O(n) reverse.  If a list is the
concatenation of two sorted lists, timsort will find the two sorted
sublists and merge them.  If a sorted list has unsorted items appended
to the end, timsort will sort the appended items and then do a merge.  I
expect any radix sort to be slower for all these cases.  Tim Peters
somewhere documented his experiments and results with various special
but plausible cases of non-randomness.


That write-up is included in Python source:
https://github.com/python/cpython/blob/master/Objects/listsort.txt

A good read if you want to know what sort of thinking, benchmarking, and 
justification should go into a new sorting algorithm.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Drastically improving list.sort() for lists of strings/ints

2016-09-11 Thread Tim Peters
[redirected from python-dev, to python-ideas;
 please send followups only to python-ideas]

[Elliot Gorokhovsky ]
> ...
> TL;DR: Should I spend time making list.sort() detect if it is sorting
> lexicographically and, if so, use a much faster algorithm?

It will be fun to find out ;-)

As Mark, and especially Terry, pointed out, a major feature of the
current sort is that it can exploit many kinds of pre-existing order.
As the paper you referenced says, "Realistic sorting problems are
usually far from random."  But, although they did run some tests
against data with significant order, they didn't test against any
algorithms _aiming_ at exploiting uniformity.  Just against their
radix sort variants, and against a quicksort.

That's where it's likely next to impossible to guess in advance
whether radix sort _will_ have a real advantage.  All the kinds of
order the current sort can exploit are far from obvious, because the
mechanisms it employs are low-level & very general.  For example,
consider arrays created by this function, for even `n`:

def bn(n):
r = [None] * n
r[0::2] = range(0, n//2)
r[1::2] = range(n//2, n)
return r

Then, e.g.,

>>> bn(16)
[0, 8, 1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 6, 14, 7, 15]

This effectively takes range(n), cuts it in half, and does a "perfect
shuffle" on the two halves.

You'll find nothing in the current code looking for this as a special
case, but it nevertheless sorts such arrays in "close to" O(n) time,
and despite that there's no natural run in the input longer than 2
elements.

That said, I'd encourage you to write your code as a new list method
at first, to make it easiest to run benchmarks.  If that proves
promising, then you can worry about how to make a single method
auto-decide which algorithm to use.

Also use the two-array version.  It's easier to understand and to
code, and stability is crucial now.  The extra memory burden doesn't
bother me - an array of C pointers consumes little memory compared to
the memory consumed by the Python objects they point at.

Most of all, have fun! :-)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.6 dict becomes compact and gets a private version; and keywords become ordered

2016-09-11 Thread Sven R. Kunze

On 11.09.2016 01:41, Nathaniel Smith wrote:

I feel like I'm missing something here... by this reasoning, we should
*never* change the language spec when new features are added. E.g. if
people use async/await in 3.5 then their code won't be compatible with
3.4, but async/await are still part of the language spec. And in any
case, the distinction between "CPython feature" and "Python
language-spec-guaranteed feature" is *extremely* arcane and
inside-basebally -- it seems really unlikely that most users will even
understand what this distinction means, never mind let it stop them
from writing CPython-and-PyPy-specific code. Emphasizing that this is
a new feature that only exists in 3.6+ of course makes sense, I just
don't understand why that affects the language spec bit.

(OTOH it doesn't matter that much anyway... the language spec is
definitely a useful thing, but it's largely aspirational in practice
-- other implementations target CPython compatibility more than they
target language spec compatibility.)


The new dict has thousands and one advantages: no need to import 
OrderDict anymore, standard syntax for OrderDict, etc.


People will love it. But is it legal to use? I tend to agree with you 
here and say "CPython mostly is the living spec" but I'm not 100% sure 
(I even restrain from writing a blog post about it although it so 
wonderful).


Cheers,
Sven
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Drastically improving list.sort() for lists of strings/ints

2016-09-11 Thread Terry Reedy

On 9/11/2016 3:45 AM, Elliot Gorokhovsky wrote:

Hello all,

I am interested in making a non-trivial improvement to list.sort(),


This is non-trivial, and harder than you seem to think it is.

> but

before I put in the work, I want to test the waters and see if this is
something the community would accept.


The debate on proposed enhancements is usually whether they are really 
enhancements, all things considered.  For special-case speedups, the 
'all things' include the frequency of the special cases, the ease of 
detecting them, the thoroughness of testintg, and the maintainability of 
the proposed, and likely more complicated, code.



Basically, I want to implement radix sort for lists of strings.


Radix sort was invented for fixed-length strings of digits, as in 
all-digit id 'numbers', so 10 bins.  Ascii strings need 128 bins, 
general byte strings need 256, still manageable.  General unicode 
strings require 1,114,112 bins, most of which will be empty for most 
characters positions.  This is harder to manage.  So are variable-length 
strings.


In CPython 3.3+, strings at the C level are not just strings but are 1, 
2, or 4 bytes per char strings.  So you could specifically target lists 
of bytes (and bytearrays) and lists of strings limited to 1-byte 
characters.  The same code should pretty much work for both.


> ...

In-place radix sort is significantly faster for lexicographic sorting than
Timsort (or in general any comparison-based sort, since radix can beat
the nlogn barrier).


This unqualified statement is doubly wrong.

First, with respect to sorting in general: 'aysmptotically faster' only 
means faster for 'large enough' tasks.  Many real world tasks are small, 
and big tasks gets broken down into multiple small tasks. 
'Asymptotically slower' algoritms may be faster for small tasks. Tim 
Peters investigated and empirically determined that an O(n*n) binary 
insort, as he optimized it on real machines, is faster than O(n*logn) 
sorting for up to around 64 items.  So timsort uses binary insertion to 
sort up to 64 items.  Similarly, you would have to investigate whether 
there is a size below which timsort is faster.


Second, with respect to timsort in particular: timsort is designed to 
exploit structure and run faster than O(n*logn) in special cases.  If a 
list is already sorted, timsort will do one O(n) scan and stop.  Any 
radix sort will take several times longer.  If a list is reverse sorted, 
timsort will do one O(n) scan and do an O(n) reverse.  If a list is the 
concatenation of two sorted lists, timsort will find the two sorted 
sublists and merge them.  If a sorted list has unsorted items appended 
to the end, timsort will sort the appended items and then do a merge.  I 
expect any radix sort to be slower for all these cases.  Tim Peters 
somewhere documented his experiments and results with various special 
but plausible cases of non-randomness.


What you might propose is this:  if the initial up or down sequence 
already determined by timsort is less than, say, 64 items, and the 
length of the list is more than some empirically determined value,
and all items are bytes or byte-char strings and some further checks 
determine the same, then switch to rsort.  But you should start by 
putting a module on PyPI.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.7: remove all private C functions from the Python C API?

2016-09-11 Thread Raymond Hettinger

> On Sep 11, 2016, at 1:37 AM, Victor Stinner  wrote:
> 
> For Python 3.7, I propose that we move all these private functions in
> separated header files, maybe Include/private/ or Include/core/, and
> not export them as part of the "regular API".
> 
> The risk is that too many C extensions rely on all these tiny
> "private" functions. Maybe for performance. I don't know.
> 
> What do you think?

I think the risk is limited and inconsequential.  We already document what is 
public, have specifically set aside a "limited" api, and the leading underscore 
private naming convention is both well established and well-understood.  Even 
with pure python code, we make the claim that Python is a consenting adults 
language and that mostly works out just fine.

The downside of the proposal is code churn and an increased maintenance burden. 
  Having more include files to search through doesn't make it easier to learn 
the C code or to maintain it.

Over time, the C code has gotten harder to read with cascades of macros and 
from breaking single concept files into multiple inter-dependent files.


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.7: remove all private C functions from the Python C API?

2016-09-11 Thread Brett Cannon
In general I support cleaning up our C API to more  clearly delineate the
boundaries of what people can rely on and what they shouldn't. Could we go
farther and do the same separation of the base and limited API at the
header level instead of interleaving through #ifndef?

On Sun, Sep 11, 2016, 01:38 Victor Stinner  wrote:

> Hi,
>
> Currently, Python has 3 C API:
>
> * python core API
> * regular API: subset of the core API
> * stable API (ABI?), the Py_LIMITED_API thing: subset of the regular API
>
> For practical purpose, all functions are declared in Include/*.h.
> Basically, Python exposes "everything". There are private functions
> which are exported using PyAPI_FUNC(), whereas they should only be
> used inside Python "core". Technically, I'm not sure that we can get
> ride of PyAPI_FUNC() because the stdlib also has extensions which use
> a few private functions.
>
> For Python 3.7, I propose that we move all these private functions in
> separated header files, maybe Include/private/ or Include/core/, and
> not export them as part of the "regular API".
>
> The risk is that too many C extensions rely on all these tiny
> "private" functions. Maybe for performance. I don't know.
>
> What do you think?
>
> See also the issue #26900, "Exclude the private API from the stable API":
> http://bugs.python.org/issue26900
>
> Victor
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Drastically improving list.sort() for lists of strings/ints

2016-09-11 Thread Raymond Hettinger

> On Sep 11, 2016, at 11:58 AM, Mark Dickinson  wrote:
> 
>> So I suppose the thing to do is to benchmark stable radix sort against 
>> timsort and see if it's still worth it.
> 
> Agreed; it would definitely be interesting to see benchmarks for the
> two-array stable sort as well as the American Flag unstable sort.
> (Indeed, I think it would be hard to move the proposal forward without
> such benchmarks.)

In the meantime, can I suggest moving this discussion to python-ideas.

There are many practical issues to be addressed:  
* sort stability
* detecting whether you're dealing with a list of strings
* working with unicode rather than inputs limited to a one-byte alphabet
* dealing with the multiple compact forms of unicode strings (i.e. complex 
internal representation)
* avoiding degenerate cases
* cache performance

The referenced article tells us that "troubles with radix sort are in 
implementation, not in conception".


Raymond


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP520 and absence of __definition_order__

2016-09-11 Thread Ethan Furman

On 09/11/2016 01:55 AM, Victor Stinner wrote:

2016-09-10 3:49 GMT-04:00 Ethan Furman wrote:



With __definition_order__ Enum can display the actual creation order of enum
members and methods, while relying on Enum.__dict__.keys() presents a
jumbled mess with many attributes the user never wrote, the enum members either
appearing /after/ all the methods (even if actually written before), or
entirely absent.


Python 3.5 also returns methods in Enum.__dict__(). So it would be a
new feature, right?


__definition_order__ is (would be) a new feature, yes.


The use case seems to be specific to Enum. Can't you add a new method
which only returns members (ordered by insertion order)?


The use case is specific to any custom metaclass that does more than enhance 
the attributes and/or methods already in the class body.


list(myenum._member_maps.keys()) returns members, sorted by insertion
order. Is it what you want?


That only includes members, not other attributes nor methods.  What I want is 
to make sure the other points of PEP 520 are not forgotten about, and that Enum 
conforms to the accepted PEP.


Code:
---
import enum

class Color(enum.Enum):
 red = 1
 blue = red
 green = 2

print(Color.__dict__.keys())
print(list(Color._member_map_.keys()))
---

Python 3.5:
---
dict_keys(['__module__', '_member_names_', 'green', '_member_type_',
'blue', '_value2member_map_', '_member_map_', '__new__', 'red',
'__doc__'])
['red', 'blue', 'green']
---

Python 3.6:
---
dict_keys(['_generate_next_value_', '__module__', '__doc__',
'_member_names_', '_member_map_', '_member_type_',
'_value2member_map_', 'red', 'blue', 'green', '__new__'])
['red', 'blue', 'green']
---

Note: It seems like dir(myenum) ignores "aliases" like blue=red in my example.


That is intentional.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Drastically improving list.sort() for lists of strings/ints

2016-09-11 Thread Mark Dickinson
On Sun, Sep 11, 2016 at 7:43 PM, Elliot Gorokhovsky
 wrote:
> So I suppose the thing to do is to benchmark stable radix sort against 
> timsort and see if it's still worth it.

Agreed; it would definitely be interesting to see benchmarks for the
two-array stable sort as well as the American Flag unstable sort.
(Indeed, I think it would be hard to move the proposal forward without
such benchmarks.)

Apart from the cases already mentioned by Chris, one of the situations
you'll want to include in the benchmarks is the case of a list that's
already almost sorted (e.g., an already sorted list with a few extra
unsorted elements appended). This is a case that does arise in
practice, and that Timsort performs particularly well on.

-- 
Mark
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Drastically improving list.sort() for lists of strings/ints

2016-09-11 Thread Elliot Gorokhovsky
The sort can be made stable, but that requires the allocation of an
equal-sized auxiliary array. To quote from the paper: "Both list-based and
two-array sorts entail Θ(n) space overhead. That overhead shrinks to
Θ(logn) in American flag sort, which, like quicksort, trades off stability
for space efficiency." So there are two options: follow C++ in providing a
stable and unstable sort, or just use stable radix sort at the cost of
allocating a scratch array. I understand why the first approach is
essentially impossible, since it could break code written under the
assumption that list.sort() is stable. But I think that in Python, since
the list just holds pointers to objects instead of objects themselves,
being in-place isn't that important: we're missing the cache all the time
anyway since our objects are stored all over the place in memory. So I
suppose the thing to do is to benchmark stable radix sort against timsort
and see if it's still worth it. Again, I really don't think the auxiliary
array would make that much of a difference. Note that in timsort we also
use an auxiliary array.

On Sun, Sep 11, 2016, 12:15 PM Mark Dickinson  wrote:

> > I am interested in making a non-trivial improvement to list.sort() [...]
>
> Would your proposed new sorting algorithm be stable? The language
> currently guarantees stability for `list.sort` and `sorted`.
>
> --
> Mark
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Drastically improving list.sort() for lists of strings/ints

2016-09-11 Thread Mark Dickinson
> I am interested in making a non-trivial improvement to list.sort() [...]

Would your proposed new sorting algorithm be stable? The language
currently guarantees stability for `list.sort` and `sorted`.

-- 
Mark
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Drastically improving list.sort() for lists of strings/ints

2016-09-11 Thread Elliot Gorokhovsky
Thanks for the link. If you look at the conclusion it says "We recommend
American flag sort as an all-round algorithm for sorting strings."

On Sun, Sep 11, 2016, 11:30 AM Raymond Hettinger <
raymond.hettin...@gmail.com> wrote:

>
> > On Sep 11, 2016, at 12:45 AM, Elliot Gorokhovsky 
> wrote:
> >
> > I am interested in making a non-trivial improvement to list.sort(), but
> before I put in the work, I want to test the waters and see if this is
> something the community would accept. Basically, I want to implement radix
> sort for lists of strings. So list.sort() would detect if it is sorting a
> list of strings (which is one of the more common things you sort in python)
> and, if so, use in-place radix sort (see
> https://xlinux.nist.gov/dads/HTML/americanFlagSort.html).
>
> For those who are interested, here is a direct link to the PDF that
> describes the algorithm.
>
>
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.6990=rep1=pdf
>
>
> Raymond
>
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Drastically improving list.sort() for lists of strings/ints

2016-09-11 Thread Raymond Hettinger

> On Sep 11, 2016, at 12:45 AM, Elliot Gorokhovsky  
> wrote:
> 
> I am interested in making a non-trivial improvement to list.sort(), but 
> before I put in the work, I want to test the waters and see if this is 
> something the community would accept. Basically, I want to implement radix 
> sort for lists of strings. So list.sort() would detect if it is sorting a 
> list of strings (which is one of the more common things you sort in python) 
> and, if so, use in-place radix sort (see 
> https://xlinux.nist.gov/dads/HTML/americanFlagSort.html).

For those who are interested, here is a direct link to the PDF that describes 
the algorithm.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.6990=rep1=pdf


Raymond

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Drastically improving list.sort() for lists of strings/ints

2016-09-11 Thread Chris Angelico
On Sun, Sep 11, 2016 at 5:45 PM, Elliot Gorokhovsky
 wrote:
> I am interested in making a non-trivial improvement to list.sort(), but
> before I put in the work, I want to test the waters and see if this is
> something the community would accept. Basically, I want to implement radix
> sort for lists of strings. So list.sort() would detect if it is sorting a
> list of strings (which is one of the more common things you sort in python)
> and, if so, use in-place radix sort (see
> https://xlinux.nist.gov/dads/HTML/americanFlagSort.html). In-place radix
> sort is significantly faster for lexicographic sorting than Timsort (or in
> general any comparison-based sort, since radix can beat the nlogn barrier).
> If you don't believe that last claim, suppose for the sake of the argument
> that it's true (because if I actually implemented this I could supply
> benchmarks to prove it).

I'd like to see these benchmarks, actually. Sounds interesting. How
does it fare on different distributions of data - for instance,
strings consisting exclusively of ASCII digits and punctuation eg
"01:12:35,726 --> 01:12:36,810", or strings consisting primarily of
ASCII but with occasional BMP or astral characters, or strings
primarily of Cyrillic text, etc? What if every single string begins
with a slash character (eg if you're sorting a bunch of path names)?
At what list size does it become visibly faster?

Could this be put onto PyPI and then benchmarked as lst.sort() vs
flagsort(lst) ?

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Drastically improving list.sort() for lists of strings/ints

2016-09-11 Thread Elliot Gorokhovsky
Hello all,

I am interested in making a non-trivial improvement to list.sort(), but
before I put in the work, I want to test the waters and see if this is
something the community would accept. Basically, I want to implement radix
sort for lists of strings. So list.sort() would detect if it is sorting a
list of strings (which is one of the more common things you sort in python)
and, if so, use in-place radix sort (see https://xlinux.nist.gov/dads/
HTML/americanFlagSort.html). In-place radix sort is significantly faster
for lexicographic sorting than Timsort (or in general any comparison-based
sort, since radix can beat the nlogn barrier). If you don't believe that
last claim, suppose for the sake of the argument that it's true (because if
I actually implemented this I could supply benchmarks to prove it).

The idea is the following: in list.sort(), if using the default comparison
operator, test the type of the first, middle, and last elements (or
something along those lines). If they are all strings, in practice this
means the list is very likely a list of strings, so it's probably worth the
investment to check and see. So we iterate through and see if they really
are all strings (again, in practice it is very unlikely this test would
fail). Luckily, this is very, very nearly free (i.e. no memory cost) since
we have to iterate through anyway as part of the in-place radix sort (first
step is to count how many elements go in each bucket, you iterate through
to count. So we would just put a test in the loop to break if it finds a
non-string).

Additionally, since often one is sorting objects with string or int fields
instead of strings or ints directly, one could check the type of the field
extracted by the key function or something.

Is this something the community would be interested inf? Again, supposing I
benchmark it and show it gives non-trivial improvements on existing
codebases. Like for example I could benchmark the top 10 packages on pypi
and show they run faster using my list.sort().

TL;DR: Should I spend time making list.sort() detect if it is sorting
lexicographically and, if so, use a much faster algorithm?

Thanks for reading,
Elliot
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: Use HTTP in testPythonOrg

2016-09-11 Thread Eric V. Smith

Hi, Berker.

Could you add a comment to the test on why this should use http? I can 
see this bouncing back and forth between http and https, as people clean 
an up all http usages to be https.


Thanks.
Eric.

On 9/11/2016 8:46 AM, berker.peksag wrote:

https://hg.python.org/cpython/rev/bc085b7e8fd8
changeset:   103634:bc085b7e8fd8
user:Berker Peksag 
date:Sun Sep 11 15:46:47 2016 +0300
summary:
  Use HTTP in testPythonOrg

files:
  Lib/test/test_robotparser.py |  2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)


diff --git a/Lib/test/test_robotparser.py b/Lib/test/test_robotparser.py
--- a/Lib/test/test_robotparser.py
+++ b/Lib/test/test_robotparser.py
@@ -276,7 +276,7 @@
 support.requires('network')
 with support.transient_internet('www.python.org'):
 parser = urllib.robotparser.RobotFileParser(
-"https://www.python.org/robots.txt;)
+"http://www.python.org/robots.txt;)
 parser.read()
 self.assertTrue(
 parser.can_fetch("*", "http://www.python.org/robots.txt;))



___
Python-checkins mailing list
python-check...@python.org
https://mail.python.org/mailman/listinfo/python-checkins



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.6 dict becomes compact and gets a private version; and keywords become ordered

2016-09-11 Thread Chris Angelico
On Sun, Sep 11, 2016 at 6:42 PM, Victor Stinner
 wrote:
> 2016-09-10 23:24 GMT-04:00 Nick Coghlan :
>> To conform with the updated language spec, implementations just need
>> to use collections.OrderedDict in 3 places:
>>
>> (...)
>> - storage type for passing kwargs to functions
>
> I'm not sure about the "just need" for this one, especially if you
> care of performances ;-)
>
> I mean, it's not easy to write an *efficient* hash table preserving
> the insertion order. Otherwise, CPython would have one since Python
> 1.5 :-)

Can the requirement for kwargs be weakened to "preserves insertion
order as long as it is not mutated"? That might make it easier on
implementations.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP520 and absence of __definition_order__

2016-09-11 Thread Victor Stinner
2016-09-10 3:49 GMT-04:00 Ethan Furman :
> With __definition_order__ Enum can display the actual creation order of enum
> members and methods, while relying on Enum.__dict__.keys() presents a
> jumbled mess with many attributes the user never wrote, the enum members 
> either
> appearing /after/ all the methods (even if actually written before), or
> entirely absent.

Python 3.5 also returns methods in Enum.__dict__(). So it would be a
new feature, right?

The use case seems to be specific to Enum. Can't you add a new method
which only returns members (ordered by insertion order)?

list(myenum._member_maps.keys()) returns members, sorted by insertion
order. Is it what you want?

Code:
---
import enum

class Color(enum.Enum):
red = 1
blue = red
green = 2

print(Color.__dict__.keys())
print(list(Color._member_map_.keys()))
---

Python 3.5:
---
dict_keys(['__module__', '_member_names_', 'green', '_member_type_',
'blue', '_value2member_map_', '_member_map_', '__new__', 'red',
'__doc__'])
['red', 'blue', 'green']
---

Python 3.6:
---
dict_keys(['_generate_next_value_', '__module__', '__doc__',
'_member_names_', '_member_map_', '_member_type_',
'_value2member_map_', 'red', 'blue', 'green', '__new__'])
['red', 'blue', 'green']
---

Note: It seems like dir(myenum) ignores "aliases" like blue=red in my example.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.6 dict becomes compact and gets a private version; and keywords become ordered

2016-09-11 Thread Victor Stinner
2016-09-10 23:24 GMT-04:00 Nick Coghlan :
> To conform with the updated language spec, implementations just need
> to use collections.OrderedDict in 3 places:
>
> (...)
> - storage type for passing kwargs to functions

I'm not sure about the "just need" for this one, especially if you
care of performances ;-)

I mean, it's not easy to write an *efficient* hash table preserving
the insertion order. Otherwise, CPython would have one since Python
1.5 :-)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Python 3.7: remove all private C functions from the Python C API?

2016-09-11 Thread Victor Stinner
Hi,

Currently, Python has 3 C API:

* python core API
* regular API: subset of the core API
* stable API (ABI?), the Py_LIMITED_API thing: subset of the regular API

For practical purpose, all functions are declared in Include/*.h.
Basically, Python exposes "everything". There are private functions
which are exported using PyAPI_FUNC(), whereas they should only be
used inside Python "core". Technically, I'm not sure that we can get
ride of PyAPI_FUNC() because the stdlib also has extensions which use
a few private functions.

For Python 3.7, I propose that we move all these private functions in
separated header files, maybe Include/private/ or Include/core/, and
not export them as part of the "regular API".

The risk is that too many C extensions rely on all these tiny
"private" functions. Maybe for performance. I don't know.

What do you think?

See also the issue #26900, "Exclude the private API from the stable API":
http://bugs.python.org/issue26900

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Webmaster] A broken link!

2016-09-11 Thread Steve Holden
On Sat, Sep 10, 2016 at 6:57 PM, Nick Coghlan  wrote:

> P.S. Although in this case, it may have just been a direct link to the
> 3.2 version of the 3.2 What's New - there isn't a lot we can do about
> that, as when a branch goes unsupported, we usually stop updating the
> docs as well (even when external links break)
>

Thanks for the note, Nick. I assumed that it would be something like that.
I know the devs go to great lengths to keep the documentation accurate, but
I personally would rather see efforts put towards current versions.

regards
 Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com