Re: [Python-Dev] [Python-checkins] cpython: Fixes test_getargs2 to get the buildbots working again.
On 11Sep2016 1959, Martin Panter wrote: On 12 September 2016 at 02:48, Steve Dowerwrote: Fixes test_getargs2 to get the buildbots working again. I'm not sure this is the fix we want to keep here, but it was sufficient to get the test going and unblock all the buildbots. I'm not entirely sure when the break appeared (essentially we seem to not be copying *args into a new tuple), but I'd guess it's to do with the fast calling improvements. That seems to be everyone else’s guess too. See https://bugs.python.org/issue28086 (bug about this failure) https://bugs.python.org/issue27213 (bisected cause) Huh, I searched and didn't find anything. Maybe I typo'd my search query? Looking at the bisected cause, it seems like the intent is to allow subclasses of tuple to pass through. Considering this is seriously going to hold up beta 1, I'd rather assume that's the intent and unblock the release. Cheers, Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Fixes test_getargs2 to get the buildbots working again.
On 12 September 2016 at 02:48, Steve Dowerwrote: >> Fixes test_getargs2 to get the buildbots working again. > > I'm not sure this is the fix we want to keep here, but it was sufficient to > get the test going and unblock all the buildbots. > > I'm not entirely sure when the break appeared (essentially we seem to not be > copying *args into a new tuple), but I'd guess it's to do with the fast > calling improvements. That seems to be everyone else’s guess too. See https://bugs.python.org/issue28086 (bug about this failure) https://bugs.python.org/issue27213 (bisected cause) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP520 and absence of __definition_order__
On 11 September 2016 at 13:05, Nick Coghlanwrote: > VOC & Batavia *should* be OK (worst case, they return > collections.OrderedDict from __prepare__ and also use it for __dict__ > attributes), but I'm less certain about MicroPython (since I don't > know enough about how its current dict implementation works to know > whether or not they'll be able to make the same change PyPy and > CPython did) Micropython' s Damien George got back to me and indicated that once they get around to working on Python 3.6 compatibility (they're currently still working on 3.5), they'd likely also need to go down the path of using collections.OrderedDict in the situations where the 3.6 language spec calls for it (MicroPython's default dict implementation is less sparse than the CPython one, trading greater memory usage efficiency for an increased risk of hash collisions, so it's unlikely the new implementation would count as "compact" from that perspective). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Fixes test_getargs2 to get the buildbots working again.
On 11Sep2016 1944, steve.dower wrote: https://hg.python.org/cpython/rev/7793d34609cb changeset: 103679:7793d34609cb user:Steve Dowerdate:Sun Sep 11 19:43:51 2016 -0700 summary: Fixes test_getargs2 to get the buildbots working again. files: Lib/test/test_getargs2.py | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/Lib/test/test_getargs2.py b/Lib/test/test_getargs2.py --- a/Lib/test/test_getargs2.py +++ b/Lib/test/test_getargs2.py @@ -471,7 +471,7 @@ ret = get_args(*TupleSubclass([1, 2])) self.assertEqual(ret, (1, 2)) -self.assertIs(type(ret), tuple) +self.assertIsInstance(ret, tuple) ret = get_args() self.assertIn(ret, ((), None)) I'm not sure this is the fix we want to keep here, but it was sufficient to get the test going and unblock all the buildbots. I'm not entirely sure when the break appeared (essentially we seem to not be copying *args into a new tuple), but I'd guess it's to do with the fast calling improvements. Cheers, Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Drastically improving list.sort() for lists of strings/ints
Wow, Tim himself! Regarding performance on semi-ordered data: we'll have to benchmark to see, but intuitively I imagine radix would meet Timsort because verifying that a list of strings is sorted takes Omega(nw) (which gives a lower bound on Timsort), where w is the word length. Radix sort is Theta(nw). So at least asymptotically it checks out. I think if one uses the two-array algorithm, other semi-sortings can also be exploited, since the items get placed into their respective buckets in the order in which they appear in the list. So, for the example you gave, one pass would sort it correctly (since the list has the property if x_1 and x_2 are in bucket b, x1 comes before x2 in the list, so x1 will also come before x2 in the bucket. Except possibly for one "border bucket" that includes n/2). And then it would just be Theta(nw/b) in each bucket to verify sorted. I mean honestly the cool thing about radix is that the best case for Timsort on strings, Omega(nw), is the worst case for radix! So the point is I think the two array version, at least, preserves a lot of structure. Anyway, I hope to have benchmarks (relatively) soon! (I'm a senior in high school so I'm pretty busy...but I'll try to get on this as soon as I can). On Sun, Sep 11, 2016 at 3:42 PM Tim Peterswrote: > [redirected from python-dev, to python-ideas; > please send followups only to python-ideas] > > [Elliot Gorokhovsky ] > > ... > > TL;DR: Should I spend time making list.sort() detect if it is sorting > > lexicographically and, if so, use a much faster algorithm? > > It will be fun to find out ;-) > > As Mark, and especially Terry, pointed out, a major feature of the > current sort is that it can exploit many kinds of pre-existing order. > As the paper you referenced says, "Realistic sorting problems are > usually far from random." But, although they did run some tests > against data with significant order, they didn't test against any > algorithms _aiming_ at exploiting uniformity. Just against their > radix sort variants, and against a quicksort. > > That's where it's likely next to impossible to guess in advance > whether radix sort _will_ have a real advantage. All the kinds of > order the current sort can exploit are far from obvious, because the > mechanisms it employs are low-level & very general. For example, > consider arrays created by this function, for even `n`: > > def bn(n): > r = [None] * n > r[0::2] = range(0, n//2) > r[1::2] = range(n//2, n) > return r > > Then, e.g., > > >>> bn(16) > [0, 8, 1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 6, 14, 7, 15] > > This effectively takes range(n), cuts it in half, and does a "perfect > shuffle" on the two halves. > > You'll find nothing in the current code looking for this as a special > case, but it nevertheless sorts such arrays in "close to" O(n) time, > and despite that there's no natural run in the input longer than 2 > elements. > > That said, I'd encourage you to write your code as a new list method > at first, to make it easiest to run benchmarks. If that proves > promising, then you can worry about how to make a single method > auto-decide which algorithm to use. > > Also use the two-array version. It's easier to understand and to > code, and stability is crucial now. The extra memory burden doesn't > bother me - an array of C pointers consumes little memory compared to > the memory consumed by the Python objects they point at. > > Most of all, have fun! :-) > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Drastically improving list.sort() for lists of strings/ints
On 09/11/2016 10:48 PM, Terry Reedy wrote: [...] Second, with respect to timsort in particular: timsort is designed to exploit structure and run faster than O(n*logn) in special cases. If a list is already sorted, timsort will do one O(n) scan and stop. Any radix sort will take several times longer. If a list is reverse sorted, timsort will do one O(n) scan and do an O(n) reverse. If a list is the concatenation of two sorted lists, timsort will find the two sorted sublists and merge them. If a sorted list has unsorted items appended to the end, timsort will sort the appended items and then do a merge. I expect any radix sort to be slower for all these cases. Tim Peters somewhere documented his experiments and results with various special but plausible cases of non-randomness. That write-up is included in Python source: https://github.com/python/cpython/blob/master/Objects/listsort.txt A good read if you want to know what sort of thinking, benchmarking, and justification should go into a new sorting algorithm. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Drastically improving list.sort() for lists of strings/ints
[redirected from python-dev, to python-ideas; please send followups only to python-ideas] [Elliot Gorokhovsky] > ... > TL;DR: Should I spend time making list.sort() detect if it is sorting > lexicographically and, if so, use a much faster algorithm? It will be fun to find out ;-) As Mark, and especially Terry, pointed out, a major feature of the current sort is that it can exploit many kinds of pre-existing order. As the paper you referenced says, "Realistic sorting problems are usually far from random." But, although they did run some tests against data with significant order, they didn't test against any algorithms _aiming_ at exploiting uniformity. Just against their radix sort variants, and against a quicksort. That's where it's likely next to impossible to guess in advance whether radix sort _will_ have a real advantage. All the kinds of order the current sort can exploit are far from obvious, because the mechanisms it employs are low-level & very general. For example, consider arrays created by this function, for even `n`: def bn(n): r = [None] * n r[0::2] = range(0, n//2) r[1::2] = range(n//2, n) return r Then, e.g., >>> bn(16) [0, 8, 1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 6, 14, 7, 15] This effectively takes range(n), cuts it in half, and does a "perfect shuffle" on the two halves. You'll find nothing in the current code looking for this as a special case, but it nevertheless sorts such arrays in "close to" O(n) time, and despite that there's no natural run in the input longer than 2 elements. That said, I'd encourage you to write your code as a new list method at first, to make it easiest to run benchmarks. If that proves promising, then you can worry about how to make a single method auto-decide which algorithm to use. Also use the two-array version. It's easier to understand and to code, and stability is crucial now. The extra memory burden doesn't bother me - an array of C pointers consumes little memory compared to the memory consumed by the Python objects they point at. Most of all, have fun! :-) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.6 dict becomes compact and gets a private version; and keywords become ordered
On 11.09.2016 01:41, Nathaniel Smith wrote: I feel like I'm missing something here... by this reasoning, we should *never* change the language spec when new features are added. E.g. if people use async/await in 3.5 then their code won't be compatible with 3.4, but async/await are still part of the language spec. And in any case, the distinction between "CPython feature" and "Python language-spec-guaranteed feature" is *extremely* arcane and inside-basebally -- it seems really unlikely that most users will even understand what this distinction means, never mind let it stop them from writing CPython-and-PyPy-specific code. Emphasizing that this is a new feature that only exists in 3.6+ of course makes sense, I just don't understand why that affects the language spec bit. (OTOH it doesn't matter that much anyway... the language spec is definitely a useful thing, but it's largely aspirational in practice -- other implementations target CPython compatibility more than they target language spec compatibility.) The new dict has thousands and one advantages: no need to import OrderDict anymore, standard syntax for OrderDict, etc. People will love it. But is it legal to use? I tend to agree with you here and say "CPython mostly is the living spec" but I'm not 100% sure (I even restrain from writing a blog post about it although it so wonderful). Cheers, Sven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Drastically improving list.sort() for lists of strings/ints
On 9/11/2016 3:45 AM, Elliot Gorokhovsky wrote: Hello all, I am interested in making a non-trivial improvement to list.sort(), This is non-trivial, and harder than you seem to think it is. > but before I put in the work, I want to test the waters and see if this is something the community would accept. The debate on proposed enhancements is usually whether they are really enhancements, all things considered. For special-case speedups, the 'all things' include the frequency of the special cases, the ease of detecting them, the thoroughness of testintg, and the maintainability of the proposed, and likely more complicated, code. Basically, I want to implement radix sort for lists of strings. Radix sort was invented for fixed-length strings of digits, as in all-digit id 'numbers', so 10 bins. Ascii strings need 128 bins, general byte strings need 256, still manageable. General unicode strings require 1,114,112 bins, most of which will be empty for most characters positions. This is harder to manage. So are variable-length strings. In CPython 3.3+, strings at the C level are not just strings but are 1, 2, or 4 bytes per char strings. So you could specifically target lists of bytes (and bytearrays) and lists of strings limited to 1-byte characters. The same code should pretty much work for both. > ... In-place radix sort is significantly faster for lexicographic sorting than Timsort (or in general any comparison-based sort, since radix can beat the nlogn barrier). This unqualified statement is doubly wrong. First, with respect to sorting in general: 'aysmptotically faster' only means faster for 'large enough' tasks. Many real world tasks are small, and big tasks gets broken down into multiple small tasks. 'Asymptotically slower' algoritms may be faster for small tasks. Tim Peters investigated and empirically determined that an O(n*n) binary insort, as he optimized it on real machines, is faster than O(n*logn) sorting for up to around 64 items. So timsort uses binary insertion to sort up to 64 items. Similarly, you would have to investigate whether there is a size below which timsort is faster. Second, with respect to timsort in particular: timsort is designed to exploit structure and run faster than O(n*logn) in special cases. If a list is already sorted, timsort will do one O(n) scan and stop. Any radix sort will take several times longer. If a list is reverse sorted, timsort will do one O(n) scan and do an O(n) reverse. If a list is the concatenation of two sorted lists, timsort will find the two sorted sublists and merge them. If a sorted list has unsorted items appended to the end, timsort will sort the appended items and then do a merge. I expect any radix sort to be slower for all these cases. Tim Peters somewhere documented his experiments and results with various special but plausible cases of non-randomness. What you might propose is this: if the initial up or down sequence already determined by timsort is less than, say, 64 items, and the length of the list is more than some empirically determined value, and all items are bytes or byte-char strings and some further checks determine the same, then switch to rsort. But you should start by putting a module on PyPI. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.7: remove all private C functions from the Python C API?
> On Sep 11, 2016, at 1:37 AM, Victor Stinnerwrote: > > For Python 3.7, I propose that we move all these private functions in > separated header files, maybe Include/private/ or Include/core/, and > not export them as part of the "regular API". > > The risk is that too many C extensions rely on all these tiny > "private" functions. Maybe for performance. I don't know. > > What do you think? I think the risk is limited and inconsequential. We already document what is public, have specifically set aside a "limited" api, and the leading underscore private naming convention is both well established and well-understood. Even with pure python code, we make the claim that Python is a consenting adults language and that mostly works out just fine. The downside of the proposal is code churn and an increased maintenance burden. Having more include files to search through doesn't make it easier to learn the C code or to maintain it. Over time, the C code has gotten harder to read with cascades of macros and from breaking single concept files into multiple inter-dependent files. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.7: remove all private C functions from the Python C API?
In general I support cleaning up our C API to more clearly delineate the boundaries of what people can rely on and what they shouldn't. Could we go farther and do the same separation of the base and limited API at the header level instead of interleaving through #ifndef? On Sun, Sep 11, 2016, 01:38 Victor Stinnerwrote: > Hi, > > Currently, Python has 3 C API: > > * python core API > * regular API: subset of the core API > * stable API (ABI?), the Py_LIMITED_API thing: subset of the regular API > > For practical purpose, all functions are declared in Include/*.h. > Basically, Python exposes "everything". There are private functions > which are exported using PyAPI_FUNC(), whereas they should only be > used inside Python "core". Technically, I'm not sure that we can get > ride of PyAPI_FUNC() because the stdlib also has extensions which use > a few private functions. > > For Python 3.7, I propose that we move all these private functions in > separated header files, maybe Include/private/ or Include/core/, and > not export them as part of the "regular API". > > The risk is that too many C extensions rely on all these tiny > "private" functions. Maybe for performance. I don't know. > > What do you think? > > See also the issue #26900, "Exclude the private API from the stable API": > http://bugs.python.org/issue26900 > > Victor > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Drastically improving list.sort() for lists of strings/ints
> On Sep 11, 2016, at 11:58 AM, Mark Dickinsonwrote: > >> So I suppose the thing to do is to benchmark stable radix sort against >> timsort and see if it's still worth it. > > Agreed; it would definitely be interesting to see benchmarks for the > two-array stable sort as well as the American Flag unstable sort. > (Indeed, I think it would be hard to move the proposal forward without > such benchmarks.) In the meantime, can I suggest moving this discussion to python-ideas. There are many practical issues to be addressed: * sort stability * detecting whether you're dealing with a list of strings * working with unicode rather than inputs limited to a one-byte alphabet * dealing with the multiple compact forms of unicode strings (i.e. complex internal representation) * avoiding degenerate cases * cache performance The referenced article tells us that "troubles with radix sort are in implementation, not in conception". Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP520 and absence of __definition_order__
On 09/11/2016 01:55 AM, Victor Stinner wrote: 2016-09-10 3:49 GMT-04:00 Ethan Furman wrote: With __definition_order__ Enum can display the actual creation order of enum members and methods, while relying on Enum.__dict__.keys() presents a jumbled mess with many attributes the user never wrote, the enum members either appearing /after/ all the methods (even if actually written before), or entirely absent. Python 3.5 also returns methods in Enum.__dict__(). So it would be a new feature, right? __definition_order__ is (would be) a new feature, yes. The use case seems to be specific to Enum. Can't you add a new method which only returns members (ordered by insertion order)? The use case is specific to any custom metaclass that does more than enhance the attributes and/or methods already in the class body. list(myenum._member_maps.keys()) returns members, sorted by insertion order. Is it what you want? That only includes members, not other attributes nor methods. What I want is to make sure the other points of PEP 520 are not forgotten about, and that Enum conforms to the accepted PEP. Code: --- import enum class Color(enum.Enum): red = 1 blue = red green = 2 print(Color.__dict__.keys()) print(list(Color._member_map_.keys())) --- Python 3.5: --- dict_keys(['__module__', '_member_names_', 'green', '_member_type_', 'blue', '_value2member_map_', '_member_map_', '__new__', 'red', '__doc__']) ['red', 'blue', 'green'] --- Python 3.6: --- dict_keys(['_generate_next_value_', '__module__', '__doc__', '_member_names_', '_member_map_', '_member_type_', '_value2member_map_', 'red', 'blue', 'green', '__new__']) ['red', 'blue', 'green'] --- Note: It seems like dir(myenum) ignores "aliases" like blue=red in my example. That is intentional. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Drastically improving list.sort() for lists of strings/ints
On Sun, Sep 11, 2016 at 7:43 PM, Elliot Gorokhovskywrote: > So I suppose the thing to do is to benchmark stable radix sort against > timsort and see if it's still worth it. Agreed; it would definitely be interesting to see benchmarks for the two-array stable sort as well as the American Flag unstable sort. (Indeed, I think it would be hard to move the proposal forward without such benchmarks.) Apart from the cases already mentioned by Chris, one of the situations you'll want to include in the benchmarks is the case of a list that's already almost sorted (e.g., an already sorted list with a few extra unsorted elements appended). This is a case that does arise in practice, and that Timsort performs particularly well on. -- Mark ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Drastically improving list.sort() for lists of strings/ints
The sort can be made stable, but that requires the allocation of an equal-sized auxiliary array. To quote from the paper: "Both list-based and two-array sorts entail Θ(n) space overhead. That overhead shrinks to Θ(logn) in American flag sort, which, like quicksort, trades off stability for space efficiency." So there are two options: follow C++ in providing a stable and unstable sort, or just use stable radix sort at the cost of allocating a scratch array. I understand why the first approach is essentially impossible, since it could break code written under the assumption that list.sort() is stable. But I think that in Python, since the list just holds pointers to objects instead of objects themselves, being in-place isn't that important: we're missing the cache all the time anyway since our objects are stored all over the place in memory. So I suppose the thing to do is to benchmark stable radix sort against timsort and see if it's still worth it. Again, I really don't think the auxiliary array would make that much of a difference. Note that in timsort we also use an auxiliary array. On Sun, Sep 11, 2016, 12:15 PM Mark Dickinsonwrote: > > I am interested in making a non-trivial improvement to list.sort() [...] > > Would your proposed new sorting algorithm be stable? The language > currently guarantees stability for `list.sort` and `sorted`. > > -- > Mark > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Drastically improving list.sort() for lists of strings/ints
> I am interested in making a non-trivial improvement to list.sort() [...] Would your proposed new sorting algorithm be stable? The language currently guarantees stability for `list.sort` and `sorted`. -- Mark ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Drastically improving list.sort() for lists of strings/ints
Thanks for the link. If you look at the conclusion it says "We recommend American flag sort as an all-round algorithm for sorting strings." On Sun, Sep 11, 2016, 11:30 AM Raymond Hettinger < raymond.hettin...@gmail.com> wrote: > > > On Sep 11, 2016, at 12:45 AM, Elliot Gorokhovsky> wrote: > > > > I am interested in making a non-trivial improvement to list.sort(), but > before I put in the work, I want to test the waters and see if this is > something the community would accept. Basically, I want to implement radix > sort for lists of strings. So list.sort() would detect if it is sorting a > list of strings (which is one of the more common things you sort in python) > and, if so, use in-place radix sort (see > https://xlinux.nist.gov/dads/HTML/americanFlagSort.html). > > For those who are interested, here is a direct link to the PDF that > describes the algorithm. > > > http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.6990=rep1=pdf > > > Raymond > > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Drastically improving list.sort() for lists of strings/ints
> On Sep 11, 2016, at 12:45 AM, Elliot Gorokhovsky> wrote: > > I am interested in making a non-trivial improvement to list.sort(), but > before I put in the work, I want to test the waters and see if this is > something the community would accept. Basically, I want to implement radix > sort for lists of strings. So list.sort() would detect if it is sorting a > list of strings (which is one of the more common things you sort in python) > and, if so, use in-place radix sort (see > https://xlinux.nist.gov/dads/HTML/americanFlagSort.html). For those who are interested, here is a direct link to the PDF that describes the algorithm. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.6990=rep1=pdf Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Drastically improving list.sort() for lists of strings/ints
On Sun, Sep 11, 2016 at 5:45 PM, Elliot Gorokhovskywrote: > I am interested in making a non-trivial improvement to list.sort(), but > before I put in the work, I want to test the waters and see if this is > something the community would accept. Basically, I want to implement radix > sort for lists of strings. So list.sort() would detect if it is sorting a > list of strings (which is one of the more common things you sort in python) > and, if so, use in-place radix sort (see > https://xlinux.nist.gov/dads/HTML/americanFlagSort.html). In-place radix > sort is significantly faster for lexicographic sorting than Timsort (or in > general any comparison-based sort, since radix can beat the nlogn barrier). > If you don't believe that last claim, suppose for the sake of the argument > that it's true (because if I actually implemented this I could supply > benchmarks to prove it). I'd like to see these benchmarks, actually. Sounds interesting. How does it fare on different distributions of data - for instance, strings consisting exclusively of ASCII digits and punctuation eg "01:12:35,726 --> 01:12:36,810", or strings consisting primarily of ASCII but with occasional BMP or astral characters, or strings primarily of Cyrillic text, etc? What if every single string begins with a slash character (eg if you're sorting a bunch of path names)? At what list size does it become visibly faster? Could this be put onto PyPI and then benchmarked as lst.sort() vs flagsort(lst) ? ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Drastically improving list.sort() for lists of strings/ints
Hello all, I am interested in making a non-trivial improvement to list.sort(), but before I put in the work, I want to test the waters and see if this is something the community would accept. Basically, I want to implement radix sort for lists of strings. So list.sort() would detect if it is sorting a list of strings (which is one of the more common things you sort in python) and, if so, use in-place radix sort (see https://xlinux.nist.gov/dads/ HTML/americanFlagSort.html). In-place radix sort is significantly faster for lexicographic sorting than Timsort (or in general any comparison-based sort, since radix can beat the nlogn barrier). If you don't believe that last claim, suppose for the sake of the argument that it's true (because if I actually implemented this I could supply benchmarks to prove it). The idea is the following: in list.sort(), if using the default comparison operator, test the type of the first, middle, and last elements (or something along those lines). If they are all strings, in practice this means the list is very likely a list of strings, so it's probably worth the investment to check and see. So we iterate through and see if they really are all strings (again, in practice it is very unlikely this test would fail). Luckily, this is very, very nearly free (i.e. no memory cost) since we have to iterate through anyway as part of the in-place radix sort (first step is to count how many elements go in each bucket, you iterate through to count. So we would just put a test in the loop to break if it finds a non-string). Additionally, since often one is sorting objects with string or int fields instead of strings or ints directly, one could check the type of the field extracted by the key function or something. Is this something the community would be interested inf? Again, supposing I benchmark it and show it gives non-trivial improvements on existing codebases. Like for example I could benchmark the top 10 packages on pypi and show they run faster using my list.sort(). TL;DR: Should I spend time making list.sort() detect if it is sorting lexicographically and, if so, use a much faster algorithm? Thanks for reading, Elliot ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Use HTTP in testPythonOrg
Hi, Berker. Could you add a comment to the test on why this should use http? I can see this bouncing back and forth between http and https, as people clean an up all http usages to be https. Thanks. Eric. On 9/11/2016 8:46 AM, berker.peksag wrote: https://hg.python.org/cpython/rev/bc085b7e8fd8 changeset: 103634:bc085b7e8fd8 user:Berker Peksagdate:Sun Sep 11 15:46:47 2016 +0300 summary: Use HTTP in testPythonOrg files: Lib/test/test_robotparser.py | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/Lib/test/test_robotparser.py b/Lib/test/test_robotparser.py --- a/Lib/test/test_robotparser.py +++ b/Lib/test/test_robotparser.py @@ -276,7 +276,7 @@ support.requires('network') with support.transient_internet('www.python.org'): parser = urllib.robotparser.RobotFileParser( -"https://www.python.org/robots.txt;) +"http://www.python.org/robots.txt;) parser.read() self.assertTrue( parser.can_fetch("*", "http://www.python.org/robots.txt;)) ___ Python-checkins mailing list python-check...@python.org https://mail.python.org/mailman/listinfo/python-checkins ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.6 dict becomes compact and gets a private version; and keywords become ordered
On Sun, Sep 11, 2016 at 6:42 PM, Victor Stinnerwrote: > 2016-09-10 23:24 GMT-04:00 Nick Coghlan : >> To conform with the updated language spec, implementations just need >> to use collections.OrderedDict in 3 places: >> >> (...) >> - storage type for passing kwargs to functions > > I'm not sure about the "just need" for this one, especially if you > care of performances ;-) > > I mean, it's not easy to write an *efficient* hash table preserving > the insertion order. Otherwise, CPython would have one since Python > 1.5 :-) Can the requirement for kwargs be weakened to "preserves insertion order as long as it is not mutated"? That might make it easier on implementations. ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP520 and absence of __definition_order__
2016-09-10 3:49 GMT-04:00 Ethan Furman: > With __definition_order__ Enum can display the actual creation order of enum > members and methods, while relying on Enum.__dict__.keys() presents a > jumbled mess with many attributes the user never wrote, the enum members > either > appearing /after/ all the methods (even if actually written before), or > entirely absent. Python 3.5 also returns methods in Enum.__dict__(). So it would be a new feature, right? The use case seems to be specific to Enum. Can't you add a new method which only returns members (ordered by insertion order)? list(myenum._member_maps.keys()) returns members, sorted by insertion order. Is it what you want? Code: --- import enum class Color(enum.Enum): red = 1 blue = red green = 2 print(Color.__dict__.keys()) print(list(Color._member_map_.keys())) --- Python 3.5: --- dict_keys(['__module__', '_member_names_', 'green', '_member_type_', 'blue', '_value2member_map_', '_member_map_', '__new__', 'red', '__doc__']) ['red', 'blue', 'green'] --- Python 3.6: --- dict_keys(['_generate_next_value_', '__module__', '__doc__', '_member_names_', '_member_map_', '_member_type_', '_value2member_map_', 'red', 'blue', 'green', '__new__']) ['red', 'blue', 'green'] --- Note: It seems like dir(myenum) ignores "aliases" like blue=red in my example. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.6 dict becomes compact and gets a private version; and keywords become ordered
2016-09-10 23:24 GMT-04:00 Nick Coghlan: > To conform with the updated language spec, implementations just need > to use collections.OrderedDict in 3 places: > > (...) > - storage type for passing kwargs to functions I'm not sure about the "just need" for this one, especially if you care of performances ;-) I mean, it's not easy to write an *efficient* hash table preserving the insertion order. Otherwise, CPython would have one since Python 1.5 :-) Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Python 3.7: remove all private C functions from the Python C API?
Hi, Currently, Python has 3 C API: * python core API * regular API: subset of the core API * stable API (ABI?), the Py_LIMITED_API thing: subset of the regular API For practical purpose, all functions are declared in Include/*.h. Basically, Python exposes "everything". There are private functions which are exported using PyAPI_FUNC(), whereas they should only be used inside Python "core". Technically, I'm not sure that we can get ride of PyAPI_FUNC() because the stdlib also has extensions which use a few private functions. For Python 3.7, I propose that we move all these private functions in separated header files, maybe Include/private/ or Include/core/, and not export them as part of the "regular API". The risk is that too many C extensions rely on all these tiny "private" functions. Maybe for performance. I don't know. What do you think? See also the issue #26900, "Exclude the private API from the stable API": http://bugs.python.org/issue26900 Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Webmaster] A broken link!
On Sat, Sep 10, 2016 at 6:57 PM, Nick Coghlanwrote: > P.S. Although in this case, it may have just been a direct link to the > 3.2 version of the 3.2 What's New - there isn't a lot we can do about > that, as when a branch goes unsupported, we usually stop updating the > docs as well (even when external links break) > Thanks for the note, Nick. I assumed that it would be something like that. I know the devs go to great lengths to keep the documentation accurate, but I personally would rather see efforts put towards current versions. regards Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com