[issue24870] Optimize ascii and latin1 decoder with surrogateescape and surrogatepass error handlers

2016-10-06 Thread INADA Naoki
Changes by INADA Naoki : -- resolution: -> fixed status: open -> closed ___ Python tracker ___

[issue24870] Optimize ascii and latin1 decoder with surrogateescape and surrogatepass error handlers

2016-01-07 Thread INADA Naoki
INADA Naoki added the comment: FYI, I found a workaround. https://github.com/PyMySQL/PyMySQL/pull/409 _table = [chr(i) for i in range(128)] + [chr(i) for i in range(0xdc80, 0xdd00)] def decode_surroundescape(s): return s.decode('latin1').translate(_table) In [15]: data = b'\xff' * 1024 *

[issue24870] Optimize ascii and latin1 decoder with surrogateescape and surrogatepass error handlers

2016-01-07 Thread STINNER Victor
STINNER Victor added the comment: > In [18]: %timeit decode_surroundescape(data) > 10 loops, best of 3: 40 ms per loop Cool! Good job. -- ___ Python tracker

[issue24870] Optimize ascii and latin1 decoder with surrogateescape and surrogatepass error handlers

2015-10-10 Thread INADA Naoki
INADA Naoki added the comment: UTF-8 and Latin1 are typical encoding for MySQL query. When inserting BLOB: # Decode binary data x = data.decode('ascii', 'surrogateescape') # %-format query psql = sql % (escape(x),) # sql is unicode # Encode sql to connection encoding (latin1 or utf8)

[issue24870] Optimize ascii and latin1 decoder with surrogateescape and surrogatepass error handlers

2015-10-09 Thread STINNER Victor
STINNER Victor added the comment: INADA Naoki: "I want to Python 3.4 and Python 3.5 solve this issue since it's critical problem for some people." On microbenchmarks, the optimization that I just implemented in Python 3.6 are impressive. The problem is that the implementation is quite

[issue24870] Optimize ascii and latin1 decoder with surrogateescape and surrogatepass error handlers

2015-10-09 Thread STINNER Victor
STINNER Victor added the comment: Short summary. Ok, I optimized ASCII, Latin1 and UTF-8 codecs (encoders and decoders) for the most common error handlers. * ASCII and Latin1 encoders: surrogateescape, replace, ignore, backslashreplace, xmlcharrefreplace * ASCII decoder: surrogateescape,

[issue24870] Optimize ascii and latin1 decoder with surrogateescape and surrogatepass error handlers

2015-10-02 Thread STINNER Victor
STINNER Victor added the comment: I created issue #25301: "Optimize UTF-8 decoder with error handlers". -- ___ Python tracker ___

[issue24870] Optimize ascii and latin1 decoder with surrogateescape and surrogatepass error handlers

2015-10-01 Thread STINNER Victor
STINNER Victor added the comment: I just pushed my patch to optimize the UTF-8 encoder with error handlers: see the issue #25267. It's up to 70 times as fast. The patch was based on Serhiy's work: faster_surrogates_hadling.patch attached to this issue. --

[issue24870] Optimize ascii and latin1 decoder with surrogateescape and surrogatepass error handlers

2015-09-24 Thread STINNER Victor
Changes by STINNER Victor : -- title: Optimize coding with surrogateescape and surrogatepass error handlers -> Optimize ascii and latin1 decoder with surrogateescape and surrogatepass error handlers ___ Python tracker

[issue24870] Optimize ascii and latin1 decoder with surrogateescape and surrogatepass error handlers

2015-09-24 Thread STINNER Victor
STINNER Victor added the comment: Serhiy wrote: "All other error handlers lose information and can't be used per se for transcoding bytes as string or string as bytes." Well, it was very simple to implement replace and ignore in decoders. I believe that the error handlers are commonly used.