[issue24870] surrogateescape is too slow

2015-08-15 Thread STINNER Victor
STINNER Victor added the comment: Serhiy: maybe we can start with ascii? -- title: Optimize coding with surrogateescape and surrogatepass error handlers - surrogateescape is too slow ___ Python tracker rep...@bugs.python.org

[issue24870] surrogateescape is too slow

2015-08-15 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Few months ago I wrote a patch that drastically speeds up encoding and decoding with surrogateescape and surrogatepass error handlers. However it causes 25% regression in decoding some UTF-8 data (U+0100-U+07FF if I remember correct) with strict error

[issue24870] surrogateescape is too slow

2015-08-15 Thread R. David Murray
R. David Murray added the comment: Why are bytes being escaped in a binary blob? The reason to use surrogateescape is when you have data that is mostly text, should be processed as text, but can have occasional binary data. That wouldn't seem to apply to a database binary blob. But that

[issue24870] surrogateescape is too slow

2015-08-14 Thread INADA Naoki
New submission from INADA Naoki: surrogateescape is recommended way to mix binary data in string protocol. But surrogateescape is too slow and it cause usability problem. One actual problem is: https://github.com/PyMySQL/PyMySQL/issues/366 surrogateescape is slow because errorhandler is called

[issue24870] surrogateescape is too slow

2015-08-14 Thread INADA Naoki
INADA Naoki added the comment: On MacBook Pro (Core i5 2.6GHz), surrogateescape 1MB data takes 250ms. In [1]: bs = bytes(range(256)) * (4 * 1024) In [2]: len(bs) Out[2]: 1048576 In [3]: %timeit x = bs.decode('ascii', 'surrogateescape') 1 loops, best of 3: 249 ms per loop --