Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Steven D'Aprano
On Fri, Jun 06, 2014 at 12:51:11PM +1200, Greg Ewing wrote: > Steven D'Aprano wrote: > >(1) I asked if it would be okay for MicroPython to *optionally* use > >nominally Unicode strings limited to ASCII. Pretty much the only > >response to this as been Guido saying "That would be a pretty lousy >

[Python-Dev] Internal representation of strings and Micropython (Steven D'Aprano's summary)

2014-06-05 Thread Jim J. Jewett
Steven D'Aprano wrote: > (1) I asked if it would be okay for MicroPython to *optionally* use > nominally Unicode strings limited to ASCII. Pretty much the only > response to this as been Guido saying "That would be a pretty lousy > option", and since nobody has really defended the suggestion,

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Greg Ewing
Paul Sokolovsky wrote: All these changes are what let me dream on and speculate on possibility that Python4 could offer an encoding-neutral string type (which means based on bytes) Can you elaborate on exactly what you have in mind? You seem to want something different from Python 3 str, Python

Re: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes

2014-06-05 Thread Nikolaus Rath
Nathaniel Smith writes: >> > tmp1 = a + b >> > tmp1 += c >> > tmp1 /= c >> > result = tmp1 >> >> Could this transformation be done in the ast? And would that help? > > I don't think it could be done in the ast because I don't think you can > work with anonymous temporaries there. B

Re: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes

2014-06-05 Thread Greg Ewing
Nathaniel Smith wrote: I'd be a little nervous about whether anyone has implemented, say, an iadd with side effects such that you can tell whether a copy was made, even if the object being copied is immediately destroyed. I can think of at least one plausible scenario where this could occur:

Re: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes

2014-06-05 Thread Sturla Molden
On 05/06/14 22:51, Nathaniel Smith wrote: This gets evaluated as: tmp1 = a + b tmp2 = tmp1 + c result = tmp2 / c All these temporaries are very expensive. Suppose that a, b, c are arrays with N bytes each, and N is large. For simple arithmetic like this, then costs are dominated by

Re: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes

2014-06-05 Thread Greg Ewing
Nathaniel Smith wrote: I.e., BIN_ADD could do if (Py_REFCNT(left) == 1) result = PyNumber_InPlaceAdd(left, right); else result = PyNumber_Add(left, right) Upside: all packages automagically benefit! Potential downsides to consider: - Subtle but real and user-visible change in Python se

Re: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes

2014-06-05 Thread Chris Angelico
On Fri, Jun 6, 2014 at 11:47 AM, Nathaniel Smith wrote: > Unfortunately we don't actually know whether Cython is the only culprit > (such code *could* be written by hand), and even if we fixed Cython it would > take some unknowable amount of time before all downstream users upgraded > their Cython

Re: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes

2014-06-05 Thread Nathaniel Smith
On 5 Jun 2014 23:58, "Terry Reedy" wrote: > > On 6/5/2014 4:51 PM, Nathaniel Smith wrote: > >> In fact, AFAICT it's 100% correct for libraries being called by >> regular python code (which is why I'm able to quote benchmarks at you >> :-)). The bytecode eval loop always holds a reference to all op

Re: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes

2014-06-05 Thread Nathaniel Smith
On 6 Jun 2014 02:16, "Nikolaus Rath" wrote: > > Nathaniel Smith writes: > > Such optimizations are important enough that numpy operations always > > give the option of explicitly specifying the output array (like > > in-place operators but more general and with clumsier syntax). Here's > > an exa

Re: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes

2014-06-05 Thread Nikolaus Rath
Nathaniel Smith writes: > Such optimizations are important enough that numpy operations always > give the option of explicitly specifying the output array (like > in-place operators but more general and with clumsier syntax). Here's > an example small-array benchmark that IIUC uses Jacobi iteratio

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Greg Ewing
Steven D'Aprano wrote: (1) I asked if it would be okay for MicroPython to *optionally* use nominally Unicode strings limited to ASCII. Pretty much the only response to this as been Guido saying "That would be a pretty lousy option", It would be limiting to have this as the *only* way of deali

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Nick Coghlan
On 6 Jun 2014 05:13, "Glenn Linderman" wrote: > > On 6/5/2014 11:41 AM, Daniel Holth wrote: >> >> discover new things >> like dance-encoded strings, bytes decoded using an incorrect encoding >> intended to be transcoded into the correct encoding later, surrogates >> that work perfectly until .enco

Re: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes

2014-06-05 Thread Nathaniel Smith
On Thu, Jun 5, 2014 at 11:12 PM, Paul Moore wrote: > On 5 June 2014 22:47, Nathaniel Smith wrote: >> To make sure I understand correctly, you're suggesting something like >> adding a new set of special method slots, __te_add__, __te_mul__, >> etc. > > I wasn't thinking in that much detail, TBH. I

Re: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes

2014-06-05 Thread Terry Reedy
On 6/5/2014 4:51 PM, Nathaniel Smith wrote: In fact, AFAICT it's 100% correct for libraries being called by regular python code (which is why I'm able to quote benchmarks at you :-)). The bytecode eval loop always holds a reference to all operands, and then immediately DECREFs them after the ope

Re: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes

2014-06-05 Thread Paul Moore
On 5 June 2014 22:47, Nathaniel Smith wrote: > To make sure I understand correctly, you're suggesting something like > adding a new set of special method slots, __te_add__, __te_mul__, > etc. I wasn't thinking in that much detail, TBH. I'm not sure adding a whole set of new slots is sensible for

Re: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes

2014-06-05 Thread Nathaniel Smith
On Thu, Jun 5, 2014 at 10:37 PM, Paul Moore wrote: > On 5 June 2014 21:51, Nathaniel Smith wrote: >> Is there a better idea I'm missing? > > Just a thought, but the temporaries come from the stack manipulation > done by the likes of the BINARY_ADD opcode. (After all the bytecode > doesn't use tem

Re: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes

2014-06-05 Thread Paul Moore
On 5 June 2014 21:51, Nathaniel Smith wrote: > Is there a better idea I'm missing? Just a thought, but the temporaries come from the stack manipulation done by the likes of the BINARY_ADD opcode. (After all the bytecode doesn't use temporaries, it's a stack machine). Maybe BINARY_ADD and friends

[Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes

2014-06-05 Thread Nathaniel Smith
Hi all, There's a very valuable optimization -- temporary elision -- which numpy can *almost* do. It gives something like a 10-30% speedup for lots of common real-world expressions. It would probably would be useful for non-numpy code too. (In fact it generalizes the str += str special case that's

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Antoine Pitrou
Le 04/06/2014 02:51, Chris Angelico a écrit : On Wed, Jun 4, 2014 at 3:17 PM, Nick Coghlan wrote: It would. The downsides of a UTF-8 representation would be slower iteration and much slower (O(N)) indexing/slicing. There's no reason for iteration to be slower. Slicing would get O(slice offset

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Glenn Linderman
On 6/5/2014 11:41 AM, Daniel Holth wrote: discover new things like dance-encoded strings, bytes decoded using an incorrect encoding intended to be transcoded into the correct encoding later, surrogates that work perfectly until .encode(), str(bytes), APIs that disagree with you about whether the

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Glenn Linderman
On 6/5/2014 3:10 AM, Paul Sokolovsky wrote: Hello, On Wed, 04 Jun 2014 22:15:30 -0400 Terry Reedy wrote: think you are again batting at a strawman. If you mean 'read from a file', and all you want to do is read bytes from and write bytes to external 'files', then there is obviously no need to

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Daniel Holth
On Thu, Jun 5, 2014 at 11:59 AM, Paul Moore wrote: > On 5 June 2014 14:15, Nick Coghlan wrote: >> As I've said before in other contexts, find me Windows, Mac OS X and >> JVM developers, or educators and scientists that are as concerned by >> the text model changes as folks that are primarily focu

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Paul Moore
On 5 June 2014 14:15, Nick Coghlan wrote: > As I've said before in other contexts, find me Windows, Mac OS X and > JVM developers, or educators and scientists that are as concerned by > the text model changes as folks that are primarily focused on Linux > system (including network) programming, an

Re: [Python-Dev] Request: new "Asyncio" component on the bug tracker

2014-06-05 Thread R. David Murray
On Thu, 05 Jun 2014 12:03:15 +0200, Victor Stinner wrote: > Would it be possible to add a new "Asyncio" component on > bugs.python.org? If this component is selected, the default nosy list > for asyncio would be used (guido, yury and me, there is already such > list in the nosy list completion).

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Steven D'Aprano
On Wed, Jun 04, 2014 at 11:17:18AM +1000, Steven D'Aprano wrote: > There is a discussion over at MicroPython about the internal > representation of Unicode strings. Micropython is aimed at embedded > devices, and so minimizing memory use is important, possibly even > more important than performa

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Nick Coghlan
On 5 June 2014 22:37, Paul Sokolovsky wrote: > On Thu, 5 Jun 2014 22:20:04 +1000 > Nick Coghlan wrote: >> problems caused by trusting the locale encoding to be correct, but the >> startup code will need non-trivial changes for that to happen - the >> C.UTF-8 locale may even become widespread befo

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Nick Coghlan
On 5 June 2014 22:10, Stefan Krah wrote: > Paul Sokolovsky wrote: >> In this regard, I'm glad to participate in mind-resetting discussion. >> So, let's reiterate - there's nothing like "the best", "the only right", >> "the only correct", "righter than", "more correct than" in CPython's >> impleme

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Paul Sokolovsky
Hello, On Thu, 5 Jun 2014 22:20:04 +1000 Nick Coghlan wrote: [] > problems caused by trusting the locale encoding to be correct, but the > startup code will need non-trivial changes for that to happen - the > C.UTF-8 locale may even become widespread before we get there). ... And until those go

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Tim Delaney
On 5 June 2014 22:01, Paul Sokolovsky wrote: > > All these changes are what let me dream on and speculate on > possibility that Python4 could offer an encoding-neutral string type > (which means based on bytes) > To me, an "encoding neutral string type" means roughly "characters are atomic", and

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Nick Coghlan
On 5 June 2014 22:01, Paul Sokolovsky wrote: >> Aside from >> some of the POSIX locale handling issues on Linux, many of the >> concerns are with the usability of bytes and bytearray, not with str - >> that's why binary interpolation is coming back in 3.5, and there will >> likely be other usabili

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Stefan Krah
Paul Sokolovsky wrote: > In this regard, I'm glad to participate in mind-resetting discussion. > So, let's reiterate - there's nothing like "the best", "the only right", > "the only correct", "righter than", "more correct than" in CPython's > implementation of Unicode storage. It is *arbitrary*. W

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Paul Sokolovsky
Hello, On Thu, 5 Jun 2014 21:43:16 +1000 Nick Coghlan wrote: > On 5 June 2014 21:25, Paul Sokolovsky wrote: > > Well, I understand the plan - hoping that people will "get over > > this". And I'm personally happy to stay away from this "trolling", > > but any discussion related to Unicode goes i

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Nick Coghlan
On 5 June 2014 21:25, Paul Sokolovsky wrote: > Well, I understand the plan - hoping that people will "get over this". > And I'm personally happy to stay away from this "trolling", but any > discussion related to Unicode goes in circles and returns to feeling > that Unicode at the central role as p

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Nick Coghlan
On 5 June 2014 17:54, Stephen J. Turnbull wrote: > What matters to you is that str (unicode) is an opaque type -- there > is no specification of the internal representation in the language > reference, and in fact several different ones coexist happily across > existing Python implementations -- a

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Paul Sokolovsky
Hello, On Thu, 05 Jun 2014 16:54:11 +0900 "Stephen J. Turnbull" wrote: > Paul Sokolovsky writes: > > > Please put that in perspective when alarming over O(1) indexing of > > inherently problematic niche datatype. (Again, it's not my or > > MicroPython's fault that it was forced as standard s

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Paul Sokolovsky
Hello, On Wed, 04 Jun 2014 22:15:30 -0400 Terry Reedy wrote: > On 6/4/2014 6:52 PM, Paul Sokolovsky wrote: > > > "Well" is subjective (or should be defined formally based on the > > requirements). With my MicroPython hat on, an implementation which > > receives a string, transcodes it, leading

[Python-Dev] Request: new "Asyncio" component on the bug tracker

2014-06-05 Thread Victor Stinner
Hi, Would it be possible to add a new "Asyncio" component on bugs.python.org? If this component is selected, the default nosy list for asyncio would be used (guido, yury and me, there is already such list in the nosy list completion). Full text search for "asyncio" returns too many results. Vict

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Stephen J. Turnbull
Serhiy Storchaka writes: > Yes, I remember. I thing that hybrid FSR-UTF16 (like FSR, but UTF-16 is > used instead of UCS4) is the better choice for CPython. I suppose that > with populating emoticons and other icon characters in nearest 5 or 10 > years, even English text will often contain

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Serhiy Storchaka
05.06.14 05:25, Terry Reedy написав(ла): I mentioned it as an alternative during the '393 discussion. I more than half agree that the FSR is the better choice for CPython, which had no particular attachment to UTF-16 in the way that I think Jython, for instance, does. Yes, I remember. I thing t

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Stephen J. Turnbull
Paul Sokolovsky writes: > Please put that in perspective when alarming over O(1) indexing of > inherently problematic niche datatype. (Again, it's not my or > MicroPython's fault that it was forced as standard string type. Maybe > if CPython seriously considered now-standard UTF-8 encoding, re

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Serhiy Storchaka
04.06.14 23:50, Glenn Linderman написав(ла): 3) (Most space efficient) One cached entry, that caches the last codepoint/byte position referenced. UTF-8 is able to be traversed in either direction, so "next/previous" codepoint access would be relatively fast (and such are very common operations, e

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Serhiy Storchaka
05.06.14 03:03, Greg Ewing написав(ла): Serhiy Storchaka wrote: html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize don't use iterators. They use indices, str.find and/or regular expressions. Common use case is quickly find substring starting from current position using str.find or

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Serhiy Storchaka
05.06.14 03:08, Greg Ewing написав(ла): Serhiy Storchaka wrote: A language which doesn't support O(1) indexing is not Python, it is only Python-like language. That's debatable, but even if it's true, I don't think there's anything wrong with MicroPython being only a "Python-like language". As

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-05 Thread Stephen J. Turnbull
Glenn Linderman writes: > 3) (Most space efficient) One cached entry, that caches the last > codepoint/byte position referenced. UTF-8 is able to be traversed in > either direction, so "next/previous" codepoint access would be > relatively fast (and such are very common operations, even whe