[issue24870] surrogateescape is too slow

2015-08-15 Thread STINNER Victor

STINNER Victor added the comment:

Serhiy: maybe we can start with ascii?

--
title: Optimize coding with surrogateescape and surrogatepass error handlers - 
surrogateescape is too slow

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24870
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24870] surrogateescape is too slow

2015-08-15 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Few months ago I wrote a patch that drastically speeds up encoding and decoding 
with surrogateescape and surrogatepass error handlers. However it causes 25% 
regression in decoding some UTF-8 data (U+0100-U+07FF if I remember correct) 
with strict error handler, so it needs some work. I hope that it is possible to 
rewrite UTF-8 decoder so that avoid a regression. The patch was postponed until 
3.5 is released. In any case the patch is large and complex enough to be new 
feature that can appear only in 3.6.

--
assignee:  - serhiy.storchaka
nosy: +serhiy.storchaka
versions:  -Python 3.4, Python 3.5

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24870
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24870] surrogateescape is too slow

2015-08-15 Thread R. David Murray

R. David Murray added the comment:

Why are bytes being escaped in a binary blob? The reason to use surrogateescape 
is when you have data that is mostly text, should be processed as text, but can 
have occasional binary data.  That wouldn't seem to apply to a database binary 
blob.

But that aside, if you want to submit a patch to speed up surrogateescape 
without changing its functionality, I'm sure it would be considered.  It would 
certainly be useful for the email library, which currently does do the stupid 
thing of encoding binary message attachments using surrogateescape (and I'm 
guessing the reason pymysql does it is something similar to why email does it: 
the code would need to be significantly reorganized to do things right).

--
nosy: +r.david.murray
versions:  -Python 3.2, Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24870
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24870] surrogateescape is too slow

2015-08-14 Thread INADA Naoki

New submission from INADA Naoki:

surrogateescape is recommended way to mix binary data in string protocol.
But surrogateescape is too slow and it cause usability problem.

One actual problem is: https://github.com/PyMySQL/PyMySQL/issues/366

surrogateescape is slow because errorhandler is called with UnicodeError object.
bs.decode('utf-8', 'surrogateescape') may produce len(bs)/2 error objects 
internally when bs is random bytes.

surrogateescape is used with ASCII and UTF-8 encoding in ordinal.
Specialized implementation can make it faster.

I want to Python 3.4 and Python 3.5 solve this issue since it's critical problem
for some people.

--
components: Unicode
messages: 248631
nosy: ezio.melotti, haypo, naoki
priority: normal
severity: normal
status: open
title: surrogateescape is too slow
type: performance
versions: Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24870
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24870] surrogateescape is too slow

2015-08-14 Thread INADA Naoki

INADA Naoki added the comment:

On MacBook Pro (Core i5 2.6GHz), surrogateescape 1MB data takes 250ms.

In [1]: bs = bytes(range(256)) * (4 * 1024)

In [2]: len(bs)
Out[2]: 1048576

In [3]: %timeit x = bs.decode('ascii', 'surrogateescape')
1 loops, best of 3: 249 ms per loop

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24870
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com