New submission from Xiang Zhang: Currently utf7 encoder uses an aggressive memory allocation strategy: use the worst case 8. We can tighten the worst case.
For 1 byte and 2 byte unicodes, the worst case could be 3*n + 2. For 4 byte unicodes, the worst case could be 6*n + 2. There are 2 cases. First, all characters needs to be encoded, the result length should be upper_round(2.67*n) + 2 <= 3*n + 2. Second, encode and not encode characters appear one by one. For even length, it's 3n < 3n + 2. For odd length, it's exactly 3n + 2. This won't benefit much when the string is short. But when the string is long, it speeds up. Without patch: [bin]$ ./python3 -m perf timeit -s 's = "abc"*10' 's.encode("utf7")' .................... Median +- std dev: 2.79 us +- 0.09 us [bin]$ ./python3 -m perf timeit -s 's = "abc"*100' 's.encode("utf7")' .................... Median +- std dev: 4.55 us +- 0.13 us [bin]$ ./python3 -m perf timeit -s 's = "abc"*1000' 's.encode("utf7")' .................... Median +- std dev: 14.0 us +- 0.4 us [bin]$ ./python3 -m perf timeit -s 's = "abc"*10000' 's.encode("utf7")' .................... Median +- std dev: 178 us +- 1 us With patch: [bin]$ ./python3 -m perf timeit -s 's = "abc"*10' 's.encode("utf7")' .................... Median +- std dev: 2.87 us +- 0.09 us [bin]$ ./python3 -m perf timeit -s 's = "abc"*100' 's.encode("utf7")' .................... Median +- std dev: 4.50 us +- 0.23 us [bin]$ ./python3 -m perf timeit -s 's = "abc"*1000' 's.encode("utf7")' .................... Median +- std dev: 13.3 us +- 0.4 us [bin]$ ./python3 -m perf timeit -s 's = "abc"*10000' 's.encode("utf7")' .................... Median +- std dev: 102 us +- 1 us The patch also removes a check, base64bits can only be not 0 when inShift is not 0. ---------- components: Interpreter Core files: utf7_encoder.patch keywords: patch messages: 279419 nosy: haypo, serhiy.storchaka, xiang.zhang priority: normal severity: normal stage: patch review status: open title: Improve utf7 encoder memory usage type: enhancement versions: Python 3.7 Added file: http://bugs.python.org/file45219/utf7_encoder.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue28531> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com