Re: [pypy-dev] Unicode encode/decode speed

Amaury Forgeot d'Arc Mon, 11 Feb 2013 08:21:15 -0800

2013/2/11 Eleytherios Stamatogiannakis <est...@gmail.com>

> Right now we are using PyPy's "codecs.utf_8_encode" and
> "codecs.utf_8_decode" to do this conversion.
>


It's the most direct way to use the utf-8 conversion functions.


> It there a faster way to do these conversions (encoding, decoding) in
> PyPy? Does CPython do something more clever than PyPY, like storing
> unicodes with full ASCII char content, in an ASCII representation?
>

Over years, utf-8 conversions have been heavily optimized in CPython:
allocate short buffers on the stack, use aligned reads, quick check for
ascii-only content (data & 0x80808080)...
All things that pypy does not.

But I tried some "timeit" runs, and pypy is often faster that CPython, and
never much slower.
Do your strings have many non-ascii characters?
what's the len(utf8)/len(unicode) ratio?


-- 
Amaury Forgeot d'Arc

_______________________________________________
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev

Re: [pypy-dev] Unicode encode/decode speed

Reply via email to