2013/2/11 Eleytherios Stamatogiannakis <est...@gmail.com> > On 11/02/13 18:13, Amaury Forgeot d'Arc wrote: > >> >> 2013/2/11 Eleytherios Stamatogiannakis <est...@gmail.com >> <mailto:est...@gmail.com>> >> >> >> Right now we are using PyPy's "codecs.utf_8_encode" and >> "codecs.utf_8_decode" to do this conversion. >> >> >> It's the most direct way to use the utf-8 conversion functions. >> >> It there a faster way to do these conversions (encoding, decoding) >> in PyPy? Does CPython do something more clever than PyPY, like >> storing unicodes with full ASCII char content, in an ASCII >> representation? >> >> >> Over years, utf-8 conversions have been heavily optimized in CPython: >> allocate short buffers on the stack, use aligned reads, quick check for >> ascii-only content (data & 0x80808080)... >> All things that pypy does not. >> >> But I tried some "timeit" runs, and pypy is often faster that CPython, >> and never much slower. >> > > This is odd. Maybe APSW uses some other CPython conversion API? Because > the conversion overhead is not visible on CPython + APSW profiles.
Which kind of profiler are you using? It possible that CPython builtin functions are not profiled the same way as PyPy's. > Do your strings have many non-ascii characters? >> what's the len(utf8)/len(unicode) ratio? >> >> > Our current tests, are using plain ASCII input (imported into sqlite3) > which: > > - Go from sqlite3 (UTF-8) -> PyPy (unicode) > - PyPy (unicode) -> sqlite3 (UTF-8). > > So i guess the len(utf-8)/len(unicode) = 1/4 > (assuming 1 byte per char for ASCII (UTF-8) and 4 bytes per char for > PyPy's unicode storage) > No, my question was about the number of non-ascii characters: s = u"SomeUnicodeString" 1.0 * len(s.encode('utf8')) / len(s) PyPy allocates the StringBuffer upfront, and must realloc to cope with multibytes characters. For English text, ratio is 1.0; for Greek, it will be close to 2.0. -- Amaury Forgeot d'Arc
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev