[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-07-04 Thread miss-islington
miss-islington added the comment: New changeset 22bcc0768e0f7eda2ae4de63aef113b1ddb4ddef by Miss Islington (bot) in branch '3.10': bpo-41486: zlib uses an UINT32_MAX sliding window for the output buffer (GH-26143)

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-07-04 Thread Gregory P. Smith
Gregory P. Smith added the comment: New changeset a9a69bb3ea1e6cf54513717212aaeae0d61b24ee by Ma Lin in branch 'main': bpo-41486: zlib uses an UINT32_MAX sliding window for the output buffer (GH-26143) https://github.com/python/cpython/commit/a9a69bb3ea1e6cf54513717212aaeae0d61b24ee

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-07-04 Thread miss-islington
Change by miss-islington : -- nosy: +miss-islington nosy_count: 3.0 -> 4.0 pull_requests: +25585 pull_request: https://github.com/python/cpython/pull/27025 ___ Python tracker

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-05-15 Thread Ma Lin
Ma Lin added the comment: Sorry, for the (init_size > UINT32_MAX) problem, I have a better solution. Please imagine this scenario: - before the patch - in 64-bit build - use zlib.decompress() function - the exact decompressed size is known and > UINT32_MAX (e.g. 10 GiB) If set the `bufsize`

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-05-15 Thread Ma Lin
Change by Ma Lin : -- pull_requests: +24779 pull_request: https://github.com/python/cpython/pull/26143 ___ Python tracker ___ ___

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-30 Thread Gregory P. Smith
Gregory P. Smith added the comment: New changeset 251ffa9d2b16b091046720628deb6a7906c35d29 by Ma Lin in branch 'master': bpo-41486: Fix initial buffer size can't > UINT32_MAX in zlib module (GH-25738) https://github.com/python/cpython/commit/251ffa9d2b16b091046720628deb6a7906c35d29

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-30 Thread Gregory P. Smith
Gregory P. Smith added the comment: Renaming to OutputBuffer sounds like a good idea. On Thu, Apr 29, 2021, 7:55 PM Ma Lin wrote: > > Ma Lin added the comment: > > Found a backward incompatible behavior. > > Before the patch, in 64-bit build, zlib module allows the initial size > >

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-29 Thread Ma Lin
Ma Lin added the comment: Found a backward incompatible behavior. Before the patch, in 64-bit build, zlib module allows the initial size > UINT32_MAX. It creates a bytes object, and uses a sliding window to deal with the UINT32_MAX limit:

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-29 Thread Ma Lin
Change by Ma Lin : -- pull_requests: +24429 pull_request: https://github.com/python/cpython/pull/25738 ___ Python tracker ___ ___

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-28 Thread Ma Lin
Ma Lin added the comment: Thanks for reviewing this big patch. Your review makes the code better. -- ___ Python tracker ___ ___

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-28 Thread Gregory P. Smith
Gregory P. Smith added the comment: Thanks, this is great work! Especially when living within the constraints of C and the existing code. -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-28 Thread Gregory P. Smith
Gregory P. Smith added the comment: New changeset f9bedb630e8a0b7d94e1c7e609b20dfaa2b22231 by Ma Lin in branch 'master': bpo-41486: Faster bz2/lzma/zlib via new output buffering (GH-21740) https://github.com/python/cpython/commit/f9bedb630e8a0b7d94e1c7e609b20dfaa2b22231 --

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-27 Thread Ma Lin
Ma Lin added the comment: The above changes were made in this commit: split core code and wrappers 55705f6dc28ff4dc6183e0eb57312c885d19090a After that commit, there is a new commit, it resolves the code conflicts introduced by PR 22126 one hour ago. Merge branch 'master'

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-26 Thread Ma Lin
Ma Lin added the comment: Very sorry for update at the last moment. But after the update, we should no need to touch it in the future, so I think it's worthy. Please review the last commit in PR 21740, the previous commits have not been changed. IMO if use a Git client such as TortoiseGit,

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-25 Thread Ma Lin
Ma Lin added the comment: > The defines of BOB_BUFFER_TYPE/BOB_SIZE_TYPE/BOB_SIZE_MAX are ugly. If put > the core code together, these defines can be put in a thin wrapper in > _bz2module.c/_lzmamodule.c/zlibmodule.c files. I tried, it looks well. I will updated the PR within one or two

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-11 Thread Ma Lin
Ma Lin added the comment: > I don't really _like_ that this is a .h file acting as a C template to inject > effectively the same static code into each module that wants to use it... > Which I think is the concern Victor is expressing in a comment above. I think so too. The defines of

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-11 Thread Gregory P. Smith
Gregory P. Smith added the comment: looking around it appears you've proposed an independent implementation of this for the thir party brotli module? https://github.com/google/brotli/pull/856 that is what i mean about making this reusable :) --

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-11 Thread Gregory P. Smith
Gregory P. Smith added the comment: I left some review comments on the PR. I like the algorithm being used. I don't really _like_ that this is a .h file acting as a C template to inject effectively the same static code into each module that wants to use it... Which I think is the

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-09 Thread STINNER Victor
Change by STINNER Victor : -- nosy: -vstinner ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-09 Thread Karthikeyan Singaravelan
Change by Karthikeyan Singaravelan : -- nosy: +methane ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-06 Thread Gregory P. Smith
Change by Gregory P. Smith : -- nosy: +gregory.p.smith ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-05 Thread Ma Lin
Ma Lin added the comment: ping -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2020-10-28 Thread Ma Lin
Ma Lin added the comment: I modify lzma module to use different growth factors, see attached picture different_factors.png 1.5x should be the growth factor of _PyBytesWriter under Windows. So if change _PyBytesWriter to use memory blocks, maybe there will be no performance improvement.

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2020-10-26 Thread STINNER Victor
STINNER Victor added the comment: It would be interested to see if using _PyBytesWriter in bz2, lzma, zlib and _io.FileIO.readall() would speed up the code. I would prefer to centralize the buffer allocation logic in _PyBytesWriter rather than having custom code in each file. _PyBytesWriter

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2020-10-26 Thread STINNER Victor
STINNER Victor added the comment: Ma Lin proposed this approach (PR 21740) for _PyBytesWriter/_PyUnicodeWriter on python-dev: https://mail.python.org/archives/list/python-...@python.org/message/UMB52BEZCX424K5K2ZNPWV7ZTQAGYL53/ -- nosy: +vstinner

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2020-08-05 Thread Ma Lin
Change by Ma Lin : -- keywords: +patch pull_requests: +20886 stage: -> patch review pull_request: https://github.com/python/cpython/pull/21740 ___ Python tracker ___

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2020-08-05 Thread Ma Lin
Change by Ma Lin : Added file: https://bugs.python.org/file49368/benchmark_real.py ___ Python tracker ___ ___ Python-bugs-list mailing list

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2020-08-05 Thread Ma Lin
Change by Ma Lin : Added file: https://bugs.python.org/file49367/benchmark.py ___ Python tracker ___ ___ Python-bugs-list mailing list

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2020-08-05 Thread Ma Lin
Change by Ma Lin : Added file: https://bugs.python.org/file49365/0to200MB_step2MB.png ___ Python tracker ___ ___ Python-bugs-list mailing

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2020-08-05 Thread Ma Lin
Change by Ma Lin : Added file: https://bugs.python.org/file49366/0to20MB_step64KB.png ___ Python tracker ___ ___ Python-bugs-list mailing

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2020-08-05 Thread Ma Lin
Change by Ma Lin : Added file: https://bugs.python.org/file49364/0to2GB_step30MB.png ___ Python tracker ___ ___ Python-bugs-list mailing

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2020-08-05 Thread Ma Lin
New submission from Ma Lin :  bz2/lzma module's current growth algorithm bz2/lzma module's initial output buffer size is 8KB [1][2], and they are using this output buffer growth algorithm [3][4]: newsize = size + (size >> 3) + 6 [1]