[issue41566] Include much faster DEFLATE implementations in Python's gzip and zlib libraries. (isa-l)

2020-09-15 Thread Ruben Vorderman
Ruben Vorderman added the comment: Hi, thanks all for the comments and the help. I have created the bindings using Cython. The project is still a work in progress as of this moment. I leave the link here for future reference. Special thanks for the Cython developers for enabling

[issue41566] Include much faster DEFLATE implementations in Python's gzip and zlib libraries. (isa-l)

2020-08-18 Thread Ruben Vorderman
Ruben Vorderman added the comment: I just find out that libdeflate does not support streaming: https://github.com/ebiggers/libdeflate/issues/73 . I should have read the manual better. So that explains the memory usage. Because of that I don't think it is suitable for usage in CPython

[issue41566] Include much faster DEFLATE implementations in Python's gzip and zlib libraries. (isa-l)

2020-08-18 Thread Ruben Vorderman
Ruben Vorderman added the comment: > That might be an option then. CPython could use the existing library if it is > available. Dynamic linking indeed seems like a great option here! Users who care about this will probably have the 'isal' and 'libdeflateO' packages ins

[issue41586] Allow to set pipe size on subprocess.Popen.

2020-08-18 Thread Ruben Vorderman
New submission from Ruben Vorderman : Pipes block if reading from an empty pipe or when writing to a full pipe. When this happens the program waiting for the pipe still uses a lot of CPU cycles when waiting for the pipe to stop blocking. I found this while working with xopen. A library

[issue41586] Allow to set pipe size on subprocess.Popen.

2020-08-19 Thread Ruben Vorderman
Change by Ruben Vorderman : -- keywords: +patch pull_requests: +21035 stage: -> patch review pull_request: https://github.com/python/cpython/pull/21921 ___ Python tracker <https://bugs.python.org/issu

[issue41566] Include much faster DEFLATE implementations in Python's gzip and zlib libraries. (isa-l)

2020-08-20 Thread Ruben Vorderman
Ruben Vorderman added the comment: > Within the stdlib, I'd focus only on using things that can be used in a 100% > api compatible way with the existing modules. > Otherwise creating a new module and putting it up on PyPI to expose the > functionality from the libraries you want

[issue41566] Include much faster DEFLATE implementations in Python's gzip and zlib libraries. (isa-l)

2020-08-24 Thread Ruben Vorderman
Ruben Vorderman added the comment: > If you take this route, please don't write it directly against the CPython > C-API (as you would for a CPython stdlib module). Thanks for reminding me of this. I was planning to take the laziest route possible anyway, reusing as much code from c

[issue41566] Include much faster DEFLATE implementations in Python's gzip and zlib libraries. (isa-l)

2020-08-17 Thread Ruben Vorderman
New submission from Ruben Vorderman : The gzip file format is quite ubiquitous and so is its first (?) free/libre implementation zlib with the gzip command line tool. This uses the DEFLATE algorithm. Lately some faster algorithms (most notable zstd) have popped up which have better speed

[issue41566] Include much faster DEFLATE implementations in Python's gzip and zlib libraries. (isa-l)

2020-08-17 Thread Ruben Vorderman
Ruben Vorderman added the comment: This has to be in a PEP. I am sorry I missplaced it on the bugtracker. -- resolution: -> not a bug stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.or

[issue41566] Include much faster DEFLATE implementations in Python's gzip and zlib libraries. (isa-l)

2020-08-17 Thread Ruben Vorderman
Ruben Vorderman added the comment: nasm or yasm will work. I only have experience building it with nasm. But yes that is indeed a dependency. Personally I do not see the problem with adding nasm as a build dependency, as it opens up possibilities for even more performance optimizations

[issue43612] zlib.compress should have a wbits argument

2021-04-26 Thread Ruben Vorderman
Change by Ruben Vorderman : -- versions: +Python 3.11 ___ Python tracker <https://bugs.python.org/issue43612> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue43612] zlib.compress should have a wbits argument

2021-04-26 Thread Ruben Vorderman
Ruben Vorderman added the comment: A patch was created, but has not been reviewed yet. -- ___ Python tracker <https://bugs.python.org/issue43612> ___ ___ Pytho

[issue43612] zlib.compress should have a wbits argument

2021-03-24 Thread Ruben Vorderman
New submission from Ruben Vorderman : zlib.compress can now only be used to output zlib blocks. Arguably `zlib.compress(my_data, level, wbits=-15)` is even more useful as it gives you a raw deflate block. That is quite interesting if you are writing your own file format and want to use

[issue43613] gzip.compress and gzip.decompress are sub-optimally implemented.

2021-03-24 Thread Ruben Vorderman
New submission from Ruben Vorderman : When working on python-isal which aims to provide faster drop-in replacements for the zlib and gzip modules I found that the gzip.compress and gzip.decompress are suboptimally implemented which hurts performance. gzip.compress and gzip.decompress both do

[issue43613] gzip.compress and gzip.decompress are sub-optimally implemented.

2021-03-24 Thread Ruben Vorderman
Change by Ruben Vorderman : -- type: -> performance ___ Python tracker <https://bugs.python.org/issue43613> ___ ___ Python-bugs-list mailing list Unsubscrib

[issue43612] zlib.compress should have a wbits argument

2021-03-24 Thread Ruben Vorderman
Change by Ruben Vorderman : -- keywords: +patch pull_requests: +23768 stage: -> patch review pull_request: https://github.com/python/cpython/pull/25011 ___ Python tracker <https://bugs.python.org/issu

[issue43612] zlib.compress should have a wbits argument

2021-03-24 Thread Ruben Vorderman
Change by Ruben Vorderman : -- type: -> enhancement ___ Python tracker <https://bugs.python.org/issue43612> ___ ___ Python-bugs-list mailing list Unsubscrib

[issue43613] gzip.compress and gzip.decompress are sub-optimally implemented.

2021-03-25 Thread Ruben Vorderman
Ruben Vorderman added the comment: I created bpo-43621 for the error issue. There should only be BadGzipFile. Once that is fixed, having only one error type will make it easier to implement some functions that are shared across the gzip.py codebase

[issue43621] gzip._GzipReader should only throw BadGzipFile errors

2021-03-25 Thread Ruben Vorderman
Change by Ruben Vorderman : -- type: -> behavior ___ Python tracker <https://bugs.python.org/issue43621> ___ ___ Python-bugs-list mailing list Unsubscrib

[issue43621] gzip._GzipReader should only throw BadGzipFile errors

2021-03-25 Thread Ruben Vorderman
New submission from Ruben Vorderman : This is properly documented: https://docs.python.org/3/library/gzip.html#gzip.BadGzipFile . It now hrows EOFErrors when a stream is truncated. But this means that upstream both BadGzipFile and EOFError need to be catched in the exception handling when

[issue43612] zlib.compress should have a wbits argument

2021-03-25 Thread Ruben Vorderman
Change by Ruben Vorderman : -- components: +Extension Modules -Library (Lib) ___ Python tracker <https://bugs.python.org/issue43612> ___ ___ Python-bugs-list m

[issue43317] python -m gzip could use a larger buffer

2021-02-24 Thread Ruben Vorderman
New submission from Ruben Vorderman : python -m gzip reads in chunks of 1024 bytes: https://github.com/python/cpython/blob/1f433406bd46fbd00b88223ad64daea6bc9eaadc/Lib/gzip.py#L599 This hurts performance somewhat. Using io.DEFAULT_BUFFER_SIZE will improve it. Also 'io.DEFAULT_BUFFER_SIZE

[issue43316] python -m gzip handles error incorrectly

2021-02-24 Thread Ruben Vorderman
New submission from Ruben Vorderman : `Python -m gzip -d myfile` will throw an error because myfile does not end in '.gz'. That is fair (even though a bit redundant, GzipFile contains a header check, so why bother checking the extension?). The problem is how this error is thrown. 1. Error

[issue43317] python -m gzip could use a larger buffer

2021-02-24 Thread Ruben Vorderman
Change by Ruben Vorderman : -- type: -> performance ___ Python tracker <https://bugs.python.org/issue43317> ___ ___ Python-bugs-list mailing list Unsubscrib

[issue43316] python -m gzip handles error incorrectly

2021-02-24 Thread Ruben Vorderman
Change by Ruben Vorderman : -- type: -> behavior ___ Python tracker <https://bugs.python.org/issue43316> ___ ___ Python-bugs-list mailing list Unsubscrib

[issue43317] python -m gzip could use a larger buffer

2021-02-25 Thread Ruben Vorderman
Change by Ruben Vorderman : -- keywords: +patch pull_requests: +23430 stage: -> patch review pull_request: https://github.com/python/cpython/pull/24645 ___ Python tracker <https://bugs.python.org/issu

[issue43316] python -m gzip handles error incorrectly

2021-02-25 Thread Ruben Vorderman
Ruben Vorderman added the comment: That sounds perfect, I didn't think of that. I will make a PR. -- ___ Python tracker <https://bugs.python.org/issue43

[issue43316] python -m gzip handles error incorrectly

2021-02-25 Thread Ruben Vorderman
Change by Ruben Vorderman : -- keywords: +patch pull_requests: +23432 stage: -> patch review pull_request: https://github.com/python/cpython/pull/24647 ___ Python tracker <https://bugs.python.org/issu

[issue43612] zlib.compress should have a wbits argument

2021-08-25 Thread Ruben Vorderman
Change by Ruben Vorderman : -- pull_requests: +26387 pull_request: https://github.com/python/cpython/pull/27941 ___ Python tracker <https://bugs.python.org/issue43

[issue43613] gzip.compress and gzip.decompress are sub-optimally implemented.

2021-08-25 Thread Ruben Vorderman
Change by Ruben Vorderman : -- keywords: +patch pull_requests: +26386 stage: -> patch review pull_request: https://github.com/python/cpython/pull/27941 ___ Python tracker <https://bugs.python.org/issu

[issue43613] gzip.compress and gzip.decompress are sub-optimally implemented.

2021-09-03 Thread Ruben Vorderman
Change by Ruben Vorderman : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.or

[issue43612] zlib.compress should have a wbits argument

2021-09-03 Thread Ruben Vorderman
Ruben Vorderman added the comment: Thanks for the review, Lukasz! It was fun to create the PR and optimize the performance for gzip.py as well. -- ___ Python tracker <https://bugs.python.org/issue43

[issue43613] gzip.compress and gzip.decompress are sub-optimally implemented.

2021-09-03 Thread Ruben Vorderman
Ruben Vorderman added the comment: Issue was solved by moving code from _GzipReader to separate functions and maintaining the same error structure. This solved the problem with maximum code reuse and full backwards compatibility. -- ___ Python

[issue45507] Small oversight in 3.11 gzip.decompress implementation with regards to backwards compatibility

2021-10-18 Thread Ruben Vorderman
New submission from Ruben Vorderman : A 'struct.error: unpack requires a buffer of 8 bytes' is thrown when a gzip trailer is truncated instead of an EOFError such as in the 3.10 and prior releases. -- components: Library (Lib) messages: 404165 nosy: rhpvorderman priority: normal

[issue45507] Small oversight in 3.11 gzip.decompress implementation with regards to backwards compatibility

2021-10-18 Thread Ruben Vorderman
Change by Ruben Vorderman : -- keywords: +patch pull_requests: +27296 stage: -> patch review pull_request: https://github.com/python/cpython/pull/29023 ___ Python tracker <https://bugs.python.org/issu

[issue45507] Small oversight in 3.11 gzip.decompress implementation with regards to backwards compatibility

2021-10-18 Thread Ruben Vorderman
Change by Ruben Vorderman : -- pull_requests: +27301 pull_request: https://github.com/python/cpython/pull/29029 ___ Python tracker <https://bugs.python.org/issue45

[issue45509] Gzip header corruption not properly checked.

2021-10-18 Thread Ruben Vorderman
Change by Ruben Vorderman : -- keywords: +patch pull_requests: +27300 stage: -> patch review pull_request: https://github.com/python/cpython/pull/29028 ___ Python tracker <https://bugs.python.org/issu

[issue45507] Small oversight in 3.11 gzip.decompress implementation with regards to backwards compatibility

2021-10-18 Thread Ruben Vorderman
Ruben Vorderman added the comment: It turns out there is a bug where FNAME and/or FCOMMENT flags are set in the header, but no error is thrown when NAME and COMMENT fields are missing. -- ___ Python tracker <https://bugs.python.org/issue45

[issue45509] Gzip header corruption not properly checked.

2021-10-18 Thread Ruben Vorderman
New submission from Ruben Vorderman : The following headers are currently allowed while being wrong: - Headers with FCOMMENT flag set, but with incomplete or missing COMMENT bytes. - Headers with FNAME flag set, but with incomplete or missing NAME bytes - Headers with FHCRC set, the crc

[issue45387] GzipFile.write should be buffered

2021-10-06 Thread Ruben Vorderman
Change by Ruben Vorderman : -- components: +Library (Lib) type: -> performance versions: +Python 3.10, Python 3.11, Python 3.6, Python 3.7, Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issu

[issue45387] GzipFile.write should be buffered

2021-10-06 Thread Ruben Vorderman
New submission from Ruben Vorderman : Please consider the following code snippet: import gzip import sys with gzip.open(sys.argv[1], "rt") as in_file_h: with gzip.open(sys.argv[2], "wt", compresslevel=1) as out_file_h: f

[issue24301] gzip module failing to decompress valid compressed file

2021-12-31 Thread Ruben Vorderman
Ruben Vorderman added the comment: ping -- ___ Python tracker <https://bugs.python.org/issue24301> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue46267] gzip.compress incorrectly ignores level parameter

2022-01-05 Thread Ruben Vorderman
New submission from Ruben Vorderman : def compress(data, compresslevel=_COMPRESS_LEVEL_BEST, *, mtime=None): """Compress data in one shot and return the compressed string. compresslevel sets the compression level in range of 0-9. mtime can be used to set the mo

[issue46267] gzip.compress incorrectly ignores level parameter

2022-01-05 Thread Ruben Vorderman
Change by Ruben Vorderman : -- keywords: +patch pull_requests: +28622 stage: -> patch review pull_request: https://github.com/python/cpython/pull/30416 ___ Python tracker <https://bugs.python.org/issu

[issue45509] Gzip header corruption not properly checked.

2021-11-22 Thread Ruben Vorderman
Ruben Vorderman added the comment: I increased the performance of the patch. I added the file used for benchmarking. I also test the FHCRC changes now. The benchmark tests headers with different flags concatenated to a DEFLATE block with no data and a gzip trailer. The data is fed

[issue45875] gzip.decompress performance can be improved with memoryviews

2021-11-22 Thread Ruben Vorderman
Ruben Vorderman added the comment: Tried and failed. It seems that the overhead of creating a new memoryview object beats the performance gained by it. -- ___ Python tracker <https://bugs.python.org/issue45

[issue45875] gzip.decompress performance can be improved with memoryviews

2021-11-22 Thread Ruben Vorderman
New submission from Ruben Vorderman : The current implementation uses a lot of bytestring slicing. While it is much better than the 3.10 and earlier implementations, it can still be further improved by using memoryviews instead. Possibly. I will check this out. -- components

[issue45509] Gzip header corruption not properly checked.

2021-11-24 Thread Ruben Vorderman
Ruben Vorderman added the comment: I have found that using the timeit module provides more precise measurements: For a simple gzip header. (As returned by gzip.compress or zlib.compress with wbits=31) ./python -m timeit -s "import io; data = b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x0

[issue24301] gzip module failing to decompress valid compressed file

2021-11-29 Thread Ruben Vorderman
Ruben Vorderman added the comment: >From the spec: https://datatracker.ietf.org/doc/html/rfc1952 2.2. File format A gzip file consists of a series of "members" (compressed data sets). The format of each member is specified in the following section. The m

[issue45875] gzip.decompress performance can be improved with memoryviews

2021-11-29 Thread Ruben Vorderman
Change by Ruben Vorderman : -- stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue45875> ___ ___ Python-bugs-list

[issue24301] gzip module failing to decompress valid compressed file

2021-11-29 Thread Ruben Vorderman
Ruben Vorderman added the comment: Whoops. Sorry, I spoke before my turn. If gzip implements it, it seems only logical that python's *gzip* module should too. I believe it can be fixed quite easily. The code should raise a warning though. I will make a PR

[issue45902] Bytes and bytesarrays can be sorted with a much faster count sort.

2021-11-26 Thread Ruben Vorderman
Ruben Vorderman added the comment: Also I didn't know if this should be in Component C-API or Interpreter Core. But I guess this will be implemented as C-API calls PyBytes_Sort and PyByteArray_SortInplace so I figured C-API is the correct component here

[issue45902] Bytes and bytesarrays can be sorted with a much faster count sort.

2021-11-26 Thread Ruben Vorderman
Ruben Vorderman added the comment: I changed the cython script a bit to use a more naive implementation without memset. Now it is always significantly faster than bytes(sorted(my_bytes)). $ python -m timeit -c "from bytes_sort import bytes_sort" "bytes_sort(b'')" 50 l

[issue45902] Bytes and bytesarrays can be sorted with a much faster count sort.

2021-11-26 Thread Ruben Vorderman
Ruben Vorderman added the comment: Sorry for the spam. I see I made a typo in the timeit script. Next time I will be more dilligent when making these kinds of reports and triple checking it before hand, and sending it once. I used -c instead of -s and now all the setup time is also included

[issue45902] Bytes and bytesarrays can be sorted with a much faster count sort.

2021-11-26 Thread Ruben Vorderman
New submission from Ruben Vorderman : Python now uses the excellent timsort for most (all?) of its sorting. But this is not the fastest sort available for one particular use case. If the number of possible values in the array is limited, it is possible to perform a counting sort: https

[issue45902] Bytes and bytesarrays can be sorted with a much faster count sort.

2021-11-26 Thread Ruben Vorderman
Ruben Vorderman added the comment: I used it for the median calculation of FASTQ quality scores (https://en.wikipedia.org/wiki/FASTQ_format). But in the end I used the frequency table to calculate the median more quickly. So as you say, the frequency table turned out to be more useful

[issue45509] Gzip header corruption not properly checked.

2021-11-22 Thread Ruben Vorderman
Ruben Vorderman added the comment: 1. Quite a lot I tested it for the two most common use case. import timeit import statistics WITH_FNAME = """ from gzip import GzipFile, decompress import io fileobj = io.BytesIO() g = GzipFile(fileobj=fileobj, mode='wb', filename='co

[issue45509] Gzip header corruption not properly checked.

2021-11-22 Thread Ruben Vorderman
Ruben Vorderman added the comment: Ping -- ___ Python tracker <https://bugs.python.org/issue45509> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue45509] Gzip header corruption not properly checked.

2021-11-05 Thread Ruben Vorderman
Ruben Vorderman added the comment: Bump. This is a bug that allows corrupted gzip files to be processed without error. Therefore I bump this issue in the hopes someone will review the PR. -- ___ Python tracker <https://bugs.python.org/issue45

[issue45507] Small oversight in 3.11 gzip.decompress implementation with regards to backwards compatibility

2021-11-05 Thread Ruben Vorderman
Ruben Vorderman added the comment: bump. This is a regression introduced by https://github.com/python/cpython/pull/27941 -- ___ Python tracker <https://bugs.python.org/issue45

[issue24301] gzip module failing to decompress valid compressed file

2021-11-29 Thread Ruben Vorderman
Change by Ruben Vorderman : -- keywords: +patch pull_requests: +28076 stage: -> patch review pull_request: https://github.com/python/cpython/pull/29847 ___ Python tracker <https://bugs.python.org/issu

[issue46267] gzip.compress incorrectly ignores level parameter

2022-03-02 Thread Ruben Vorderman
Ruben Vorderman added the comment: ping -- ___ Python tracker <https://bugs.python.org/issue46267> ___ ___ Python-bugs-list mailing list Unsubscribe: