New submission from Ma Lin <malin...@163.com>: 🔵 bz2/lzma module's current growth algorithm
bz2/lzma module's initial output buffer size is 8KB [1][2], and they are using this output buffer growth algorithm [3][4]: newsize = size + (size >> 3) + 6 [1] https://github.com/python/cpython/blob/v3.9.0b4/Modules/_bz2module.c#L109 [2] https://github.com/python/cpython/blob/v3.9.0b4/Modules/_lzmamodule.c#L124 [3] https://github.com/python/cpython/blob/v3.9.0b4/Modules/_lzmamodule.c#L133 [4] https://github.com/python/cpython/blob/v3.9.0b4/Modules/_bz2module.c#L121 For many case, the output buffer is resized too many times. You may paste this code to REPL to see the growth step: size = 8*1024 for i in range(1, 120): print('Step %d ' % i, format(size, ','), 'bytes') size = size + (size >> 3) + 6 Step 1 8,192 bytes Step 2 9,222 bytes Step 3 10,380 bytes Step 4 11,683 bytes Step 5 13,149 bytes Step 6 14,798 bytes ... 🔵 zlib module's current growth algorithm zlib module's initial output buffer size is 16KB [5], in each growth the buffer size doubles [6]. [5] https://github.com/python/cpython/blob/v3.9.0b4/Modules/zlibmodule.c#L32 [6] https://github.com/python/cpython/blob/v3.9.0b4/Modules/zlibmodule.c#L174 This algorithm has a higher risk of running out of memory: ... Step 14 256 MB Step 15 512 MB Step 16 1 GB Step 17 2 GB Step 18 4 GB Step 19 8 GB Step 20 16 GB Step 21 32 GB Step 22 64 GB ... 🔵 Add _BlocksOutputBuffer for bz2/lzma/zlib module Proposed PR uses a list of bytes object to represent output buffer. It can eliminate the overhead of resizing (bz2/lzma), and prevent excessive memory footprint (zlib). I only tested decompression, because the result is more obvious than compression. For special data benchmark (all data consists of b'a'), see these attached pictures, _BlocksOutputBuffer has linear performance: (Benchmark by attached file benchmark.py) 0to2GB_step30MB.png (Decompress from 0 to 2GB, 30MB step) 0to200MB_step2MB.png (Decompress from 0 to 200MB, 2MB step) 0to20MB_step64KB.png (Decompress from 0 to 20MB, 64KB step) After switching to _BlocksOutputBuffer, the code of bz2/lzma is more concise, the code of zlib is basically translated statement by statement, IMO it's safe and easy for review. 🔵 Real data benchmark For real data, the weight of resizing output buffer is not big, so the performance improvement is not as big as above pictures: (Benchmark by attached file benchmark_real.py) ----------------- bz2 ----------------- linux-2.6.39.4.tar.bz2 input size: 76,097,195, output size: 449,638,400 best of 5: [baseline_raw] 12.954 sec -> [patched_raw] 11.600 sec, 1.12x faster (-10%) firefox-79.0.linux-i686.tar.bz2 input size: 74,109,706, output size: 228,055,040 best of 5: [baseline_raw] 8.511 sec -> [patched_raw] 7.829 sec, 1.09x faster (-8%) ffmpeg-4.3.1.tar.bz2 input size: 11,301,038, output size: 74,567,680 best of 5: [baseline_raw] 1.915 sec -> [patched_raw] 1.671 sec, 1.15x faster (-13%) gimp-2.10.20.tar.bz2 input size: 33,108,938, output size: 214,179,840 best of 5: [baseline_raw] 5.794 sec -> [patched_raw] 4.964 sec, 1.17x faster (-14%) sde-external-8.56.0-2020-07-05-lin.tar.bz2 input size: 26,746,086, output size: 92,129,280 best of 5: [baseline_raw] 3.153 sec -> [patched_raw] 2.835 sec, 1.11x faster (-10%) ----------------- lzma ----------------- linux-5.7.10.tar.xz input size: 112,722,840, output size: 966,062,080 best of 5: [baseline_raw] 9.813 sec -> [patched_raw] 7.434 sec, 1.32x faster (-24%) linux-2.6.39.4.tar.xz input size: 63,243,812, output size: 449,638,400 best of 5: [baseline_raw] 5.256 sec -> [patched_raw] 4.200 sec, 1.25x faster (-20%) gcc-9.3.0.tar.xz input size: 70,533,868, output size: 618,608,640 best of 5: [baseline_raw] 6.398 sec -> [patched_raw] 4.878 sec, 1.31x faster (-24%) Python-3.8.5.tar.xz input size: 18,019,640, output size: 87,531,520 best of 5: [baseline_raw] 1.315 sec -> [patched_raw] 1.098 sec, 1.20x faster (-16%) firefox-79.0.source.tar.xz input size: 333,220,776, output size: 2,240,573,440 best of 5: [baseline_raw] 25.339 sec -> [patched_raw] 19.661 sec, 1.29x faster (-22%) ----------------- zlib ----------------- linux-5.7.10.tar.gz input size: 175,493,557, output size: 966,062,080 best of 5: [baseline_raw] 2.360 sec -> [patched_raw] 2.401 sec, 1.02x slower (+2%) linux-2.6.39.4.tar.gz input size: 96,011,459, output size: 449,638,400 best of 5: [baseline_raw] 1.215 sec -> [patched_raw] 1.216 sec, 1.00x slower (+0%) gcc-9.3.0.tar.gz input size: 124,140,228, output size: 618,608,640 best of 5: [baseline_raw] 1.668 sec -> [patched_raw] 1.555 sec, 1.07x faster (-7%) Python-3.8.5.tgz input size: 24,149,093, output size: 87,531,520 best of 5: [baseline_raw] 0.263 sec -> [patched_raw] 0.253 sec, 1.04x faster (-4%) openjdk-14.0.2_linux-x64_bin.tar.gz input size: 198,606,190, output size: 335,175,680 best of 5: [baseline_raw] 1.273 sec -> [patched_raw] 1.221 sec, 1.04x faster (-4%) postgresql-10.12-1-linux-x64-binaries.tar.gz input size: 159,090,037, output size: 408,678,400 best of 5: [baseline_raw] 1.415 sec -> [patched_raw] 1.401 sec, 1.01x faster (-1%) -------------- benchmark end -------------- 🔵 Add .readall() function to _compression.DecompressReader class The last commit add .readall() function to _compression.DecompressReader, it optimize the .read(-1) for BZ2File/LZMAFile/GzipFile object. The following description is a bit complicate: 1, BZ2File/LZMAFile/GzipFile object has a `self._buffer` attribute [7]: raw = _compression.DecompressReader(fp, BZ2Decompressor/LZMADecompressor/zlib.decompressobj) self._buffer = io.BufferedReader(raw) [7] https://github.com/python/cpython/blob/v3.9.0b4/Lib/lzma.py#L130-L132 2, When call `.read(-1)` function (read all data) on BZ2File/LZMAFile/GzipFile object, will dispatch to `self._buffer`'s .read(-1) function [8]. [8] https://github.com/python/cpython/blob/v3.9.0b4/Lib/lzma.py#L200 3, `self._buffer` is an io.BufferedReader object, it will dispatch to underlying stream's readall() function [9]. [9] https://github.com/python/cpython/blob/v3.9.0b4/Modules/_io/bufferedio.c#L1538-L1542 4, The underlying stream is a _compression.DecompressReader object, which is a subclass of io.RawIOBase [10]. [10] https://github.com/python/cpython/blob/v3.9.0b4/Lib/_compression.py#L33 5, Then io.RawIOBase's readall() function is called, but it's very inefficient. It only read DEFAULT_BUFFER_SIZE (8KB) data each time. [11] [11] https://github.com/python/cpython/blob/v3.9.0b4/Modules/_io/iobase.c#L968-L969 6, So each time the decompressor will try to decompress 8KB input data to a 8KB output buffer [12]: data = self._decompressor.decompress(8KB_input_buffer, 8KB_output_buffer) [12] https://github.com/python/cpython/blob/v3.9.0b4/Lib/_compression.py#L103 In most cases, the input buffer will not be fully decompressed, this brings unnecessary overhead. After adding the .readall() function to _compression.DecompressReader, the limit of the output buffer size has been removed. This change requires _BlocksOutputBuffer, otherwise, in extreme cases (small input buffer be decompressed to 100MB data), it will be slower due to the overhead of buffer resizing. Benchmark with real data, call .read(-1) on XXXXFile object: (Benchmark by attached file benchmark_real.py) ----------------- BZ2File ----------------- linux-2.6.39.4.tar.bz2 input size: 76,097,195, output size: 449,638,400 best of 5: [baseline_file] 12.371 sec -> [patched_file] 12.035 sec, 1.03x faster (-3%) firefox-79.0.linux-i686.tar.bz2 input size: 74,109,706, output size: 228,055,040 best of 5: [baseline_file] 8.233 sec -> [patched_file] 8.124 sec, 1.01x faster (-1%) ffmpeg-4.3.1.tar.bz2 input size: 11,301,038, output size: 74,567,680 best of 5: [baseline_file] 1.747 sec -> [patched_file] 1.724 sec, 1.01x faster (-1%) gimp-2.10.20.tar.bz2 input size: 33,108,938, output size: 214,179,840 best of 5: [baseline_file] 5.220 sec -> [patched_file] 5.162 sec, 1.01x faster (-1%) sde-external-8.56.0-2020-07-05-lin.tar.bz2 input size: 26,746,086, output size: 92,129,280 best of 5: [baseline_file] 2.935 sec -> [patched_file] 2.899 sec, 1.01x faster (-1%) ----------------- LZMAFile ----------------- linux-5.7.10.tar.xz input size: 112,722,840, output size: 966,062,080 best of 5: [baseline_file] 7.712 sec -> [patched_file] 7.670 sec, 1.01x faster (-1%) linux-2.6.39.4.tar.xz input size: 63,243,812, output size: 449,638,400 best of 5: [baseline_file] 4.342 sec -> [patched_file] 4.274 sec, 1.02x faster (-2%) gcc-9.3.0.tar.xz input size: 70,533,868, output size: 618,608,640 best of 5: [baseline_file] 5.077 sec -> [patched_file] 4.974 sec, 1.02x faster (-2%) Python-3.8.5.tar.xz input size: 18,019,640, output size: 87,531,520 best of 5: [baseline_file] 1.093 sec -> [patched_file] 1.087 sec, 1.01x faster (-1%) firefox-79.0.source.tar.xz input size: 333,220,776, output size: 2,240,573,440 best of 5: [baseline_file] 20.748 sec -> [patched_file] 20.276 sec, 1.02x faster (-2%) ----------------- GzipFile ----------------- linux-5.7.10.tar.gz input size: 175,493,567, output size: 966,062,080 best of 5: [baseline_file] 4.422 sec -> [patched_file] 3.803 sec, 1.16x faster (-14%) linux-2.6.39.4.tar.gz input size: 96,011,469, output size: 449,638,400 best of 5: [baseline_file] 2.119 sec -> [patched_file] 1.863 sec, 1.14x faster (-12%) gcc-9.3.0.tar.gz input size: 124,140,238, output size: 618,608,640 best of 5: [baseline_file] 2.825 sec -> [patched_file] 2.409 sec, 1.17x faster (-15%) Python-3.8.5.tgz input size: 24,149,103, output size: 87,531,520 best of 5: [baseline_file] 0.410 sec -> [patched_file] 0.349 sec, 1.18x faster (-15%) openjdk-14.0.2_linux-x64_bin.tar.gz input size: 198,606,200, output size: 335,175,680 best of 5: [baseline_file] 1.885 sec -> [patched_file] 1.702 sec, 1.11x faster (-10%) postgresql-10.12-1-linux-x64-binaries.tar.gz input size: 159,090,047, output size: 408,678,400 best of 5: [baseline_file] 2.236 sec -> [patched_file] 2.002 sec, 1.12x faster (-10%) -------------- benchmark end -------------- ---------- components: Library (Lib) messages: 374867 nosy: malin priority: normal severity: normal status: open title: Add _BlocksOutputBuffer for bz2/lzma/zlib module type: performance versions: Python 3.10 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue41486> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com