STINNER Victor added the comment:
Serhiy: maybe we can start with ascii?
--
title: Optimize coding with surrogateescape and surrogatepass error handlers -
surrogateescape is too slow
___
Python tracker rep...@bugs.python.org
Serhiy Storchaka added the comment:
Few months ago I wrote a patch that drastically speeds up encoding and decoding
with surrogateescape and surrogatepass error handlers. However it causes 25%
regression in decoding some UTF-8 data (U+0100-U+07FF if I remember correct)
with strict error
R. David Murray added the comment:
Why are bytes being escaped in a binary blob? The reason to use surrogateescape
is when you have data that is mostly text, should be processed as text, but can
have occasional binary data. That wouldn't seem to apply to a database binary
blob.
But that
New submission from INADA Naoki:
surrogateescape is recommended way to mix binary data in string protocol.
But surrogateescape is too slow and it cause usability problem.
One actual problem is: https://github.com/PyMySQL/PyMySQL/issues/366
surrogateescape is slow because errorhandler is called
INADA Naoki added the comment:
On MacBook Pro (Core i5 2.6GHz), surrogateescape 1MB data takes 250ms.
In [1]: bs = bytes(range(256)) * (4 * 1024)
In [2]: len(bs)
Out[2]: 1048576
In [3]: %timeit x = bs.decode('ascii', 'surrogateescape')
1 loops, best of 3: 249 ms per loop
--