Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Terry Reedy
On 6/4/2014 6:54 PM, Serhiy Storchaka wrote: 05.06.14 00:21, Terry Reedy написав(ла): On 6/4/2014 3:41 AM, Jeff Allen wrote: Jython uses UTF-16 internally -- probably the only sensible choice in a Python that can call Java. Indexing is O(N), fundamentally. By "fundamentally", I mean for those s

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Terry Reedy
On 6/4/2014 6:52 PM, Paul Sokolovsky wrote: "Well" is subjective (or should be defined formally based on the requirements). With my MicroPython hat on, an implementation which receives a string, transcodes it, leading to bigger size, just to immediately transcode back and send out - is awful, en

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Paul Sokolovsky
Hello, On Thu, 05 Jun 2014 12:08:21 +1200 Greg Ewing wrote: > Serhiy Storchaka wrote: > > A language which doesn't support O(1) indexing is not Python, it is > > only Python-like language. > > That's debatable, but even if it's true, I don't think > there's anything wrong with MicroPython being

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Chris Angelico
On Thu, Jun 5, 2014 at 10:03 AM, Greg Ewing wrote: > StringPositions could support the following operations: > >StringPosition + int --> StringPosition >StringPosition - int --> StringPosition >StringPosition - StringPosition --> int > > These would be computed by counting characters f

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Paul Sokolovsky
Hello, On Thu, 05 Jun 2014 12:03:17 +1200 Greg Ewing wrote: > Serhiy Storchaka wrote: > > html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize > > don't use iterators. They use indices, str.find and/or regular > > expressions. Common use case is quickly find substring starting > > fr

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Greg Ewing
Glenn Linderman wrote: so algorithms that walk two strings at a time cannot use the same StringPosition to do so... yep, this is quite divergent from CPython and Python. They can, it's just that at most one of the indexing operations would be fast; the StringPosition would devolve into an in

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Greg Ewing
Glenn Linderman wrote: For that kind of thing, you don't need an actual character index, just some way of referring to a place in a string. I think you meant codepoint index, rather than character index. Probably, but what I said is true either way. This starts to diverge from Python code

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Glenn Linderman
On 6/4/2014 5:08 PM, Glenn Linderman wrote: On 6/4/2014 5:03 PM, Greg Ewing wrote: Serhiy Storchaka wrote: html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize don't use iterators. They use indices, str.find and/or regular expressions. Common use case is quickly find substring star

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Glenn Linderman
On 6/4/2014 5:03 PM, Greg Ewing wrote: Serhiy Storchaka wrote: html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize don't use iterators. They use indices, str.find and/or regular expressions. Common use case is quickly find substring starting from current position using str.find or

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Greg Ewing
Serhiy Storchaka wrote: A language which doesn't support O(1) indexing is not Python, it is only Python-like language. That's debatable, but even if it's true, I don't think there's anything wrong with MicroPython being only a "Python-like language". As has been pointed out, fitting Python onto

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Greg Ewing
Serhiy Storchaka wrote: html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize don't use iterators. They use indices, str.find and/or regular expressions. Common use case is quickly find substring starting from current position using str.find or re.search, process found token, advance

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Eric Snow
On Wed, Jun 4, 2014 at 5:11 PM, Paul Sokolovsky wrote: > On Wed, 4 Jun 2014 16:12:23 -0600 > Eric Snow wrote: >> Actually, there is a "formal, implementation-independent language >> spec": >> >> https://docs.python.org/3/reference/ > > Opening that link in browser, pressing Ctrl+F and pasting you

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Serhiy Storchaka
05.06.14 00:21, Terry Reedy написав(ла): On 6/4/2014 3:41 AM, Jeff Allen wrote: Jython uses UTF-16 internally -- probably the only sensible choice in a Python that can call Java. Indexing is O(N), fundamentally. By "fundamentally", I mean for those strings that have not yet noticed that they con

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Paul Sokolovsky
Hello, On Wed, 4 Jun 2014 16:12:23 -0600 Eric Snow wrote: > On Wed, Jun 4, 2014 at 3:14 PM, Paul Sokolovsky > wrote: > > That said, and unlike previous attempts to develop a small Python > > implementations (which of course existed), we're striving to be > > exactly a Python language implementa

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Chris Angelico
On Thu, Jun 5, 2014 at 8:52 AM, Paul Sokolovsky wrote: > "Well" is subjective (or should be defined formally based on the > requirements). With my MicroPython hat on, an implementation which > receives a string, transcodes it, leading to bigger size, just to > immediately transcode back and send o

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Serhiy Storchaka
05.06.14 01:04, Terry Reedy написав(ла): PS. You do not seem to be aware of how well the current PEP393 implementation works. If you are going to write any more about it, I suggest you run Tools/Stringbench/stringbench.py for timings. AFAIK stringbench is ASCII-only, so it likely is compatible

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Paul Sokolovsky
Hello, On Wed, 04 Jun 2014 18:04:52 -0400 Terry Reedy wrote: > On 6/4/2014 5:14 PM, Paul Sokolovsky wrote: > > > That said, and unlike previous attempts to develop a small Python > > implementations (which of course existed), we're striving to be > > exactly a Python language implementation, no

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Eric Snow
On Wed, Jun 4, 2014 at 3:14 PM, Paul Sokolovsky wrote: > That said, and unlike previous attempts to develop a small Python > implementations (which of course existed), we're striving to be exactly > a Python language implementation, not a Python-like language > implementation. As there's no formal

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Terry Reedy
On 6/4/2014 5:14 PM, Paul Sokolovsky wrote: That said, and unlike previous attempts to develop a small Python implementations (which of course existed), we're striving to be exactly a Python language implementation, not a Python-like language implementation. As there's no formal, implementation-

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Glenn Linderman
On 6/4/2014 2:28 PM, Chris Angelico wrote: On Thu, Jun 5, 2014 at 6:50 AM, Glenn Linderman wrote: 8) (Content specific variable size caches) Index each codepoint that is a different byte size than the previous codepoint, allowing indexing to be used in the intervals. Worst case size is like 2,

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread R. David Murray
On Thu, 05 Jun 2014 00:14:32 +0300, Paul Sokolovsky wrote: > That said, and unlike previous attempts to develop a small Python > implementations (which of course existed), we're striving to be exactly > a Python language implementation, not a Python-like language > implementation. As there's no fo

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Chris Angelico
On Thu, Jun 5, 2014 at 6:50 AM, Glenn Linderman wrote: > 8) (Content specific variable size caches) Index each codepoint that is a > different byte size than the previous codepoint, allowing indexing to be > used in the intervals. Worst case size is like 2, best case size is a single > entry for

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Terry Reedy
On 6/4/2014 3:41 AM, Jeff Allen wrote: Jython uses UTF-16 internally -- probably the only sensible choice in a Python that can call Java. Indexing is O(N), fundamentally. By "fundamentally", I mean for those strings that have not yet noticed that they contain no supplementary (>0x) characters

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Terry Reedy
On 6/4/2014 3:41 AM, Jeff Allen wrote: Jython uses UTF-16 internally -- probably the only sensible choice in a Python that can call Java. Indexing is O(N), fundamentally. By "fundamentally", I mean for those strings that have not yet noticed that they contain no supplementary (>0x) characters

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Paul Sokolovsky
Hello, On Wed, 4 Jun 2014 11:25:51 -0700 Guido van Rossum wrote: > This thread has devolved into a flame war. I think we should trust the > Micropython implementers (whoever they are -- are they participating > here?) I'm a regular contributor. I'm not sure if the author, Damien George, is on

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Glenn Linderman
On 6/4/2014 6:14 AM, Steve Dower wrote: I'm agree with Daniel. Directly indexing into text suggests an attempted optimization that is likely to be incorrect for a set of strings. Splitting, regex, concatenation and formatting are really the main operations that matter, and MicroPython can optim

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Steven D'Aprano
On Wed, Jun 04, 2014 at 03:32:25PM +, Steve Dower wrote: > Steven D'Aprano wrote: > > The language semantics says that a string is an array of code points. Every > > index relates to a single code point, no code point extends over two or more > > indexes. > > There's a 1:1 relationship between

Re: [Python-Dev] Should standard library modules optimize for CPython?

2014-06-04 Thread Stefan Behnel
Sturla Molden, 03.06.2014 22:51: > Stefan Behnel wrote: >> So the >> argument in favour is mostly a pragmatic one. If you can have 2-5x faster >> code essentially for free, why not just go for it? > > I would be easier if the GIL or Cython's use of it was redesigned. Cython > just grabs the GIL an

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Paul Sokolovsky
Hello, On Wed, 04 Jun 2014 20:52:14 +0300 Serhiy Storchaka wrote: [] > > That's sad, I agree. > > Other languages (Go, Rust) can be happy without O(1) indexing of > strings. All string and regex operations work with iterators or > cursors, and I believe this approach is not significant worse t

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Guido van Rossum
This thread has devolved into a flame war. I think we should trust the Micropython implementers (whoever they are -- are they participating here?) to know their users and let them do what feels right to them. We should just ask them not to claim full compatibility with any particular Python version

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Stephen J. Turnbull
Serhiy Storchaka writes: > It would be interesting to collect a statistic about how many indexing > operations happened during the life of a string in typical (Micro)Python > program. Probably irrelevant (I doubt anybody is going to be writing programmers' editors in MicroPython), but by far

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Serhiy Storchaka
04.06.14 20:05, Paul Sokolovsky написав(ла): On Wed, 04 Jun 2014 19:49:18 +0300 Serhiy Storchaka wrote: html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize don't use iterators. They use indices, str.find and/or regular expressions. Common use case is quickly find substring starting

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Serhiy Storchaka
04.06.14 17:49, Paul Sokolovsky написав(ла): On Thu, 5 Jun 2014 00:26:10 +1000 Chris Angelico wrote: On Thu, Jun 5, 2014 at 12:17 AM, Serhiy Storchaka wrote: 04.06.14 10:03, Chris Angelico написав(ла): Right, which is why I don't like the idea. But you don't need non-ASCII characters to blin

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Serhiy Storchaka
04.06.14 19:52, MRAB написав(ла): In order to avoid indexing, you could use some kind of 'cursor' class to step forwards and backwards along strings. The cursor could include both the codepoint index and the byte index. So you need different string library and different regular expression libr

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Paul Sokolovsky
Hello, On Wed, 04 Jun 2014 19:49:18 +0300 Serhiy Storchaka wrote: [] > > But show me real-world case for that. Common usecase is scanning > > string left-to-right, that should be done using iterator and thus > > O(N). Right-to-left scanning would be order(s) of magnitude less > > frequent, as an

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread MRAB
On 2014-06-04 14:33, Nick Coghlan wrote: On 4 June 2014 15:39, wrote: On Wed, Jun 04, 2014 at 03:17:00PM +1000, Nick Coghlan wrote: There's a general expectation that indexing will be O(1) because all the builtin containers that support that syntax use it for O(1) lookup operations. Depend

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Serhiy Storchaka
04.06.14 18:38, Paul Sokolovsky написав(ла): Any non-trivial text parsing uses indices or regular expressions (and regular expressions themself use indices internally). I keep hearing this stuff, and unfortunately so far don't have enough time to collect all that stuff and provide detailed resp

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread INADA Naoki
For Jython and IronPython, UTF-16 may be best internal encoding. Recent languages (Swiffy, Golang, Rust) chose UTF-8 as internal encoding. Using utf-8 is simple and efficient. For example, no need for utf-8 copy of the string when writing to file and serializing to JSON. When implementing Python

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Paul Sokolovsky
Hello, On Thu, 5 Jun 2014 01:00:52 +1000 Chris Angelico wrote: > On Thu, Jun 5, 2014 at 12:49 AM, Paul Sokolovsky > wrote: > >> > But you need non-ASCII characters to display a title of MP3 > >> > track. > > > > Yes, but to display a title, you don't need to do codepoint access > > at random -

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Mark Lawrence
On 04/06/2014 16:32, Steve Dower wrote: If copying into a separate list is a problem (memory-wise), re.finditer('\\S+', string) also provides the same behaviour and gives me the sliced string, so there's no need to index for anything. Out of idle curiosity is there anything that stops Micro

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Steve Dower
Paul Sokolovsky wrote: > You just shouldn't write inefficient programs, voila. But if you want, you > can keep writing inefficient programs, they just will be inefficient. Peace. Can I nominate this for QOTD? :) Cheers, Steve ___ Python-Dev mailing lis

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Paul Sokolovsky
Hello, On Wed, 04 Jun 2014 17:40:14 +0300 Serhiy Storchaka wrote: > 04.06.14 17:02, Paul Moore написав(ла): > > On 4 June 2014 14:39, Serhiy Storchaka wrote: > >> I think than breaking O(1) expectation for indexing makes the > >> implementation significant incompatible with Python. Virtually al

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Steve Dower
Steven D'Aprano wrote: > The language semantics says that a string is an array of code points. Every > index relates to a single code point, no code point extends over two or more > indexes. > There's a 1:1 relationship between code points and indexes. How is direct > indexing "likely to be incorre

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Daniel Holth
On Wed, Jun 4, 2014 at 10:12 AM, Steven D'Aprano wrote: > On Wed, Jun 04, 2014 at 01:14:04PM +, Steve Dower wrote: >> I'm agree with Daniel. Directly indexing into text suggests an >> attempted optimization that is likely to be incorrect for a set of >> strings. > > I'm afraid I don't understa

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Serhiy Storchaka
04.06.14 17:02, Paul Moore написав(ла): On 4 June 2014 14:39, Serhiy Storchaka wrote: I think than breaking O(1) expectation for indexing makes the implementation significant incompatible with Python. Virtually all string operations in Python operates with indices. I don't use indexing on str

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Chris Angelico
On Thu, Jun 5, 2014 at 12:49 AM, Paul Sokolovsky wrote: >> > But you need non-ASCII characters to display a title of MP3 track. > > Yes, but to display a title, you don't need to do codepoint access at > random - you need to either take a block of memory (length in bytes) and > do something with i

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Paul Sokolovsky
Hello, On Thu, 5 Jun 2014 00:26:10 +1000 Chris Angelico wrote: > On Thu, Jun 5, 2014 at 12:17 AM, Serhiy Storchaka > wrote: > > 04.06.14 10:03, Chris Angelico написав(ла): > > > >> Right, which is why I don't like the idea. But you don't need > >> non-ASCII characters to blink an LED or turn a

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Steven D'Aprano
On Wed, Jun 04, 2014 at 01:38:57PM +0300, Paul Sokolovsky wrote: > That's another reason why people don't like Unicode enforced upon them Enforcing design and language decisions is the job of the programming language. You might as well complain that Python forces C doubles as the floating point

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Chris Angelico
On Thu, Jun 5, 2014 at 12:17 AM, Serhiy Storchaka wrote: > 04.06.14 10:03, Chris Angelico написав(ла): > >> Right, which is why I don't like the idea. But you don't need >> non-ASCII characters to blink an LED or turn a servo, and there is >> significant resistance to the notion that appending a n

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Serhiy Storchaka
04.06.14 10:03, Chris Angelico написав(ла): Right, which is why I don't like the idea. But you don't need non-ASCII characters to blink an LED or turn a servo, and there is significant resistance to the notion that appending a non-ASCII character to a long ASCII-only string requires the whole str

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Steven D'Aprano
On Wed, Jun 04, 2014 at 01:14:04PM +, Steve Dower wrote: > I'm agree with Daniel. Directly indexing into text suggests an > attempted optimization that is likely to be incorrect for a set of > strings. I'm afraid I don't understand this argument. The language semantics says that a string i

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Paul Moore
On 4 June 2014 14:39, Serhiy Storchaka wrote: > I think than breaking O(1) expectation for indexing makes the implementation > significant incompatible with Python. Virtually all string operations in > Python operates with indices. I don't use indexing on strings except in rare situations. Sure I

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Daniel Holth
MicroPython is going to be significantly incompatible with Python anyway. But you should be able to run your mp code on regular Python. On Wed, Jun 4, 2014 at 9:39 AM, Serhiy Storchaka wrote: > 04.06.14 04:17, Steven D'Aprano написав(ла): > >> Would either of these trade-offs be acceptable while

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Serhiy Storchaka
04.06.14 04:17, Steven D'Aprano написав(ла): Would either of these trade-offs be acceptable while still claiming "Python 3.4 compatibility"? My own feeling is that O(1) string indexing operations are a quality of implementation issue, not a deal breaker to call it a Python. I can't see any requi

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Nick Coghlan
On 4 June 2014 15:39, wrote: > On Wed, Jun 04, 2014 at 03:17:00PM +1000, Nick Coghlan wrote: > >> There's a general expectation that indexing will be O(1) because all >> the builtin containers that support that syntax use it for O(1) lookup >> operations. > > Depending on your definition of built

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Mark Lawrence
On 04/06/2014 11:53, Paul Sokolovsky wrote: Hello, On Tue, 3 Jun 2014 22:23:07 -0700 Guido van Rossum wrote: [] Never mind disabling assertions -- even with enabled assertions you'd have to expect most Python programs to fail with non-ASCII input. Then again the UTF-8 option would be pretty

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Steve Dower
I'm agree with Daniel. Directly indexing into text suggests an attempted optimization that is likely to be incorrect for a set of strings. Splitting, regex, concatenation and formatting are really the main operations that matter, and MicroPython can optimize their implementation of these easily

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Paul Sokolovsky
Hello, On Wed, 4 Jun 2014 21:17:12 +1000 Chris Angelico wrote: > On Wed, Jun 4, 2014 at 9:12 PM, Paul Sokolovsky > wrote: > > An alternative view is that the discussion on the tracker showed > > Python developers' mind-fixation on implementing something the way > > CPython does it. And I didn't

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Daniel Holth
If we're voting I think representing Unicode internally in micropython as utf-8 with O(N) indexing is a great idea, partly because I'm not sure indexing into strings is a good idea - lots of Unicode code points don't make sense by themselves; see also grapheme clusters. It would probably work great

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Kristján Valur Jónsson
For those that haven't seen this: http://www.utf8everywhere.org/ > -Original Message- > From: Python-Dev [mailto:python-dev- > bounces+kristjan=ccpgames@python.org] On Behalf Of Donald Stufft > Sent: 4. júní 2014 01:46 > To: Steven D'Aprano > Cc: python-dev@python.org > Subject: Re: [

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Paul Sokolovsky
Hello, On Wed, 4 Jun 2014 20:53:46 +1000 Chris Angelico wrote: > On Wed, Jun 4, 2014 at 8:38 PM, Paul Sokolovsky > wrote: > > And I'm saying that not to discourage Unicode addition to > > MicroPython, but to hint that "force-force" approach implemented by > > CPython3 and causing rage and split

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Daniel Holth
Can of worms, opened. On Jun 4, 2014 7:20 AM, "Chris Angelico" wrote: > On Wed, Jun 4, 2014 at 9:12 PM, Paul Sokolovsky wrote: > > An alternative view is that the discussion on the tracker showed Python > > developers' mind-fixation on implementing something the way CPython does > > it. And I di

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Chris Angelico
On Wed, Jun 4, 2014 at 9:12 PM, Paul Sokolovsky wrote: > An alternative view is that the discussion on the tracker showed Python > developers' mind-fixation on implementing something the way CPython does > it. And I didn't yet go to that argument, but in the end, MicroPython > does not try to rewr

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Paul Sokolovsky
Hello, On Tue, 3 Jun 2014 22:23:07 -0700 Guido van Rossum wrote: [] > Never mind disabling assertions -- even with enabled assertions you'd > have to expect most Python programs to fail with non-ASCII input. > > Then again the UTF-8 option would be pretty devastating too for > anything manipula

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Paul Sokolovsky
Hello, On Wed, 4 Jun 2014 17:03:22 +1000 Chris Angelico wrote: [] > > Why not support variable-width strings like CPython 3.4? > > That was my first recommendation, and in fact I started writing code > to implement parts of PEP 393, with a view to basically doing it the > same way in both Pyth

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Paul Sokolovsky
Hello, On Wed, 4 Jun 2014 12:32:12 +1000 Chris Angelico wrote: > On Wed, Jun 4, 2014 at 11:17 AM, Steven D'Aprano > wrote: > > * Having a build-time option to restrict all strings to ASCII-only. > > > > (I think what they mean by that is that strings will be like > > Python 2 strings, ASCII-p

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Chris Angelico
On Wed, Jun 4, 2014 at 8:38 PM, Paul Sokolovsky wrote: > And I'm saying that not to discourage Unicode addition to MicroPython, > but to hint that "force-force" approach implemented by CPython3 and > causing rage and split in the community is not appreciated. FWIW, it's Python 3 (the language) an

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Chris Angelico
On Wed, Jun 4, 2014 at 8:38 PM, Paul Sokolovsky wrote: > That's another reason why people don't like Unicode enforced upon them > - all the talk about supporting all languages and scripts is demagogy > and hypocrisy, given a choice, Unicode zealots would rather limit > people to Latin script then

[Python-Dev] Some notes about MicroPython from an observer

2014-06-04 Thread Daniel Holth
- micropython is designed to run on a machine with 192 kilobytes of RAM and perhaps a megabyte of FLASH. The controller can execute read-only code directly from FLASH. There is no dynamic linker in this environment. (It also has a UNIX port). - However it does include a full Python parser and REPL,

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Juraj Sukop
On Wed, Jun 4, 2014 at 11:36 AM, Stephen J. Turnbull wrote: > > I think you really need to check what the applications are in detail. > UTF-8 costs about 35% more storage for Japanese, and even more for > Chinese, than does UTF-16. "UTF-8 can be smaller even for Asian languages, e.g.: front pag

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Stephen J. Turnbull
dw+python-...@hmmz.org writes: > Given the specialized kinds of application this Python > implementation is targetted at, it seems UTF-8 is ideal considering > the huge memory savings resulting from the compressed > representation, I think you really need to check what the applications are in

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Jeff Allen
Jython uses UTF-16 internally -- probably the only sensible choice in a Python that can call Java. Indexing is O(N), fundamentally. By "fundamentally", I mean for those strings that have not yet noticed that they contain no supplementary (>0x) characters. I've toyed with making this O(1) u

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread dw+python-dev
On Wed, Jun 04, 2014 at 03:17:00PM +1000, Nick Coghlan wrote: > There's a general expectation that indexing will be O(1) because all > the builtin containers that support that syntax use it for O(1) lookup > operations. Depending on your definition of built in, there is at least one standard libr

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Chris Angelico
On Wed, Jun 4, 2014 at 5:02 PM, wrote: > There are more things to consider for the internal implementation, > in particular how the string length is implemented. Several alternatives > exist: > 1. store the UTF-8 length (i.e. memory size) > 2. store the number of code points (i.e. Python len()) >

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread Chris Angelico
On Wed, Jun 4, 2014 at 3:23 PM, Guido van Rossum wrote: > On Tue, Jun 3, 2014 at 7:32 PM, Chris Angelico wrote: >> >> On Wed, Jun 4, 2014 at 11:17 AM, Steven D'Aprano >> wrote: >> > * Having a build-time option to restrict all strings to ASCII-only. >> > >> > (I think what they mean by that is

Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread martin
Zitat von Steven D'Aprano : * Having a build-time option to restrict all strings to ASCII-only. (I think what they mean by that is that strings will be like Python 2 strings, ASCII-plus-arbitrary-bytes, not actually ASCII.) An ASCII-plus-arbitrary-bytes type called "str" would prevent claimi