2013/2/11 Eleytherios Stamatogiannakis <est...@gmail.com>

> On 11/02/13 18:13, Amaury Forgeot d'Arc wrote:
>
>>
>> 2013/2/11 Eleytherios Stamatogiannakis <est...@gmail.com
>> <mailto:est...@gmail.com>>
>>
>>
>>     Right now we are using PyPy's "codecs.utf_8_encode" and
>>     "codecs.utf_8_decode" to do this conversion.
>>
>>
>> It's the most direct way to use the utf-8 conversion functions.
>>
>>     It there a faster way to do these conversions (encoding, decoding)
>>     in PyPy? Does CPython do something more clever than PyPY, like
>>     storing unicodes with full ASCII char content, in an ASCII
>>     representation?
>>
>>
>> Over years, utf-8 conversions have been heavily optimized in CPython:
>> allocate short buffers on the stack, use aligned reads, quick check for
>> ascii-only content (data & 0x80808080)...
>> All things that pypy does not.
>>
>> But I tried some "timeit" runs, and pypy is often faster that CPython,
>> and never much slower.
>>
>
> This is odd. Maybe APSW uses some other CPython conversion API? Because
> the conversion overhead is not visible on CPython + APSW profiles.


Which kind of profiler are you using? It possible that CPython builtin
functions are not profiled the same way as PyPy's.


>  Do your strings have many non-ascii characters?
>> what's the len(utf8)/len(unicode) ratio?
>>
>>
> Our current tests, are using plain ASCII input (imported into sqlite3)
> which:
>
> - Go from sqlite3 (UTF-8) -> PyPy (unicode)
> - PyPy (unicode) -> sqlite3 (UTF-8).
>
> So i guess the len(utf-8)/len(unicode) = 1/4
> (assuming 1 byte per char for ASCII (UTF-8) and 4 bytes per char for
> PyPy's unicode storage)
>

No, my question was about the number of non-ascii characters:
    s = u"SomeUnicodeString"
    1.0 * len(s.encode('utf8')) / len(s)
PyPy allocates the StringBuffer upfront, and must realloc to cope with
multibytes characters.
For English text, ratio is 1.0; for Greek, it will be close to 2.0.

-- 
Amaury Forgeot d'Arc
_______________________________________________
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev

Reply via email to