[issue31484] Cache single-character strings outside of the Latin1 range

Ezio Melotti Fri, 15 Sep 2017 16:06:09 -0700

Ezio Melotti added the comment:

The Greek sample includes 155 unique characters (including whitespace, 
punctuation, and the english characters at the beginning), so they can all fit 
in the cache.
The Chinese sample however includes 3695 unique characters (all within the 
BMP), probably causing a lot more misses in the cache and a slowdown caused by 
the overhead.
The Chinese text you used for the test is also from some 700 years ago, and 
uses traditional and vernacular Chinese, so the number of unique character is 
higher than what you would normally encounter in modern Chinese.


----------

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue31484>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue31484] Cache single-character strings outside of the Latin1 range

Reply via email to