Re: RE Module Performance

Chris Angelico Thu, 25 Jul 2013 03:11:58 -0700

On Thu, Jul 25, 2013 at 7:22 PM, Steven D'Aprano
<steve+comp.lang.pyt...@pearwood.info> wrote:
> What I'm trying to say is that it is possible to use UTF-16 internally,
> but *not* assume that every code point (character) is represented by a
> single 2-byte unit. For example, the len() of a UTF-16 string should not
> be calculated by counting the number of bytes and dividing by two. You
> actually need to walk the string, inspecting each double-byte


Anything's possible. But since underlying representations can be
changed fairly easily (relative term of course - it's a lot of work,
but it can be changed in a single release, no deprecation required or
anything), there's very little reason to continue using UTF-16
underneath. May as well switch to UTF-32 for convenience, or PEP 393
for convenience and efficiency, or maybe some other system that's
still mostly fixed-width.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: RE Module Performance

Reply via email to