On 28/07/2013 20:23, wxjmfa...@gmail.com wrote:
[snip]

Compare these (a BDFL exemple, where I'using a non-ascii char)

Py 3.2 (narrow build)

Why are you using a narrow build of Python 3.2? It doesn't treat all
codepoints equally (those outside the BMP can't be stored in one code
unit) and, therefore, it isn't "Unicode compliant"!

timeit.timeit("a = 'hundred'; 'x' in a")
0.09897159682121348
timeit.timeit("a = 'hundre€'; 'x' in a")
0.09079501961732461
sys.getsizeof('d')
32
sys.getsizeof('€')
32
sys.getsizeof('dd')
34
sys.getsizeof('d€')
34


Py3.3

timeit.timeit("a = 'hundred'; 'x' in a")
0.12183182740848858
timeit.timeit("a = 'hundre€'; 'x' in a")
0.2365732969632326
sys.getsizeof('d')
26
sys.getsizeof('€')
40
sys.getsizeof('dd')
27
sys.getsizeof('d€')
42

Tell me which one seems to be more "unicode compliant"?
The goal of Unicode is to handle every char "equaly".

Now, the problem: memory. Do not forget that à la "FSR"
mechanism for a non-ascii user is *irrelevant*. As
soon as one uses one single non-ascii, your ascii feature
is lost. (That why we have all these dedicated coding
schemes, utfs included).

sys.getsizeof('abc' * 1000 + 'z')
3026
sys.getsizeof('abc' * 1000 + '\U00010010')
12044

A bit secret. The larger a repertoire of characters
is, the more bits you needs.
Secret #2. You can not escape from this.


jmf


--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to