On 5 June 2014 17:54, Stephen J. Turnbull <step...@xemacs.org> wrote:
> What matters to you is that str (unicode) is an opaque type -- there
> is no specification of the internal representation in the language
> reference, and in fact several different ones coexist happily across
> existing Python implementations -- and you're free to use a UTF-8
> implementation if that suits the applications you expect for
> MicroPython.

However, as others have noted in the thread, the critical thing is to
*not* let that internal implementation detail leak into the Python
level string behaviour. That's what happened with narrow builds of
Python 2 and pre-PEP-393 releases of Python 3 (effectively using
UTF-16 internally), and it was the cause of a sufficiently large
number of bugs that the Linux distributions tend to instead accept the
memory cost of using wide builds (4 bytes for all code points) for
affected versions.

Preserving the "the Python 3 str type is an immutable array of code
points" semantics matters significantly more than whether or not
indexing by code point is O(1). The various caching tricks suggested
in this thread (especially "leading ASCII characters", "trailing ASCII
characters" and "position & index of last lookup") could keep the
typical lookup performance well below O(N).

> PEP 393 exists, of course, and specifies the current internal
> representation for CPython 3.  But I don't see anything in it that
> suggests it's mandated for any other implementation.

CPython is constrained by C API compatibility requirements, as well as
implementation constraints due to the amount of internal code that
would need to be rewritten to handle a variable width encoding as the
canonical internal representation (since the problems with Python 2
narrow builds mean we already know variable width encodings aren't
handled correctly by the current code).

Implementations that share code with CPython, or try to mimic the C
API especially closely, may face similar restrictions. Outside that, I
think we're better off if alternative implementations are free to
experiment with different internal string representations.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to