Re: FSR and unicode compliance - was Re: RE Module Performance

MRAB Sun, 28 Jul 2013 12:48:48 -0700

On 28/07/2013 20:23, wxjmfa...@gmail.com wrote:
[snip]


Compare these (a BDFL exemple, where I'using a non-ascii char)

Py 3.2 (narrow build)

Why are you using a narrow build of Python 3.2? It doesn't treat all
codepoints equally (those outside the BMP can't be stored in one code
unit) and, therefore, it isn't "Unicode compliant"!

timeit.timeit("a = 'hundred'; 'x' in a")

0.09897159682121348

timeit.timeit("a = 'hundre€'; 'x' in a")

0.09079501961732461

sys.getsizeof('d')

sys.getsizeof('€')

sys.getsizeof('dd')

sys.getsizeof('d€')

34


Py3.3

timeit.timeit("a = 'hundred'; 'x' in a")

0.12183182740848858

timeit.timeit("a = 'hundre€'; 'x' in a")

0.2365732969632326

sys.getsizeof('d')

sys.getsizeof('€')

sys.getsizeof('dd')

sys.getsizeof('d€')

42

Tell me which one seems to be more "unicode compliant"?
The goal of Unicode is to handle every char "equaly".

Now, the problem: memory. Do not forget that à la "FSR"
mechanism for a non-ascii user is *irrelevant*. As
soon as one uses one single non-ascii, your ascii feature
is lost. (That why we have all these dedicated coding
schemes, utfs included).

sys.getsizeof('abc' * 1000 + 'z')

sys.getsizeof('abc' * 1000 + '\U00010010')

12044

A bit secret. The larger a repertoire of characters
is, the more bits you needs.
Secret #2. You can not escape from this.


jmf


--
http://mail.python.org/mailman/listinfo/python-list

Re: FSR and unicode compliance - was Re: RE Module Performance

Reply via email to