Re: Unicode 7

MRAB Tue, 29 Apr 2014 11:15:02 -0700

On 2014-04-29 18:37, wxjmfa...@gmail.com wrote:

Let see how Python is ready for the next Unicode version
(Unicode 7.0.0.Beta).

timeit.repeat("(x*1000 + y)[:-1]", setup="x = 'abc'; y = 'z'")

[1.4027834829454946, 1.38714224331963, 1.3822586635296261]

timeit.repeat("(x*1000 + y)[:-1]", setup="x = 'abc'; y = '\u0fce'")

[5.462776291480395, 5.4479432055423445, 5.447874284053398]



# more interesting
timeit.repeat("(x*1000 + y)[:-1]",\

...     setup="x = 'abc'.encode('utf-8'); y = '\u0fce'.encode('utf-8')")
[1.3496489533188765, 1.328654286266783, 1.3300913977710707]

Although the third example is the fastest, it's also the wrong way to
handle Unicode:

>>> x = 'abc'.encode('utf-8'); y = '\u0fce'.encode('utf-8')
>>> t = (x*1000 + y)[:-1].decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>

UnicodeDecodeError: 'utf-8' codec can't decode bytes in position3000-3001: unex

pected end of data

Note 1:  "lookup" is not the problem.

Note 2: From Unicode.org : "[...] We strongly encourage [...] and test
them with their programs [...]"

-> Done.

jmf


--
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

Reply via email to