New submission from Xiang Zhang:

Currently utf7 encoder uses an aggressive memory allocation strategy: use the 
worst case 8. We can tighten the worst case.

For 1 byte and 2 byte unicodes, the worst case could be 3*n + 2. For 4 byte 
unicodes, the worst case could be 6*n + 2.

There are 2 cases. First, all characters needs to be encoded, the result length 
should be upper_round(2.67*n) + 2 <= 3*n + 2. Second, encode and not encode 
characters appear one by one. For even length, it's 3n < 3n + 2. For odd 
length, it's exactly 3n + 2.

This won't benefit much when the string is short. But when the string is long, 
it speeds up.

Without patch:

[bin]$ ./python3 -m perf timeit -s 's = "abc"*10' 's.encode("utf7")'
....................
Median +- std dev: 2.79 us +- 0.09 us
[bin]$ ./python3 -m perf timeit -s 's = "abc"*100' 's.encode("utf7")'
....................
Median +- std dev: 4.55 us +- 0.13 us
[bin]$ ./python3 -m perf timeit -s 's = "abc"*1000' 's.encode("utf7")'
....................
Median +- std dev: 14.0 us +- 0.4 us
[bin]$ ./python3 -m perf timeit -s 's = "abc"*10000' 's.encode("utf7")'
....................
Median +- std dev: 178 us +- 1 us

With patch:

[bin]$ ./python3 -m perf timeit -s 's = "abc"*10' 's.encode("utf7")'
....................
Median +- std dev: 2.87 us +- 0.09 us
[bin]$ ./python3 -m perf timeit -s 's = "abc"*100' 's.encode("utf7")'
....................
Median +- std dev: 4.50 us +- 0.23 us
[bin]$ ./python3 -m perf timeit -s 's = "abc"*1000' 's.encode("utf7")'
....................
Median +- std dev: 13.3 us +- 0.4 us
[bin]$ ./python3 -m perf timeit -s 's = "abc"*10000' 's.encode("utf7")'
....................
Median +- std dev: 102 us +- 1 us

The patch also removes a check, base64bits can only be not 0 when inShift is 
not 0.

----------
components: Interpreter Core
files: utf7_encoder.patch
keywords: patch
messages: 279419
nosy: haypo, serhiy.storchaka, xiang.zhang
priority: normal
severity: normal
stage: patch review
status: open
title: Improve utf7 encoder memory usage
type: enhancement
versions: Python 3.7
Added file: http://bugs.python.org/file45219/utf7_encoder.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28531>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to