[Python-checkins] gh-132983: Add documentation for compression.zstd (GH-133911)

gpshead Wed, 21 May 2025 07:55:51 -0700

https://github.com/python/cpython/commit/d862b6de1bcff01229318d7138ddaab755723a43
commit: d862b6de1bcff01229318d7138ddaab755723a43
branch: main
author: Emma Smith <[email protected]>
committer: gpshead <[email protected]>
date: 2025-05-21T07:18:21-07:00
summary:


gh-132983: Add documentation for compression.zstd (GH-133911)

Add documentation for compression & compression.zstd.

🎉

---------

Co-authored-by: Adam Turner <[email protected]>
Co-authored-by: Stan Ulbrych <[email protected]>
Co-authored-by: Hugo van Kemenade <[email protected]>
Co-authored-by: Sumana Harihareswara <[email protected]>
Co-authored-by: Martin Panter <[email protected]>

files:
A Doc/library/compression.rst
A Doc/library/compression.zstd.rst
M Doc/library/archiving.rst

diff --git a/Doc/library/archiving.rst b/Doc/library/archiving.rst
index c9284949af4972..da0b3f8c3e7693 100644
--- a/Doc/library/archiving.rst
+++ b/Doc/library/archiving.rst
@@ -5,13 +5,15 @@ Data Compression and Archiving
 ******************************
 
 The modules described in this chapter support data compression with the zlib,
-gzip, bzip2 and lzma algorithms, and the creation of ZIP- and tar-format
+gzip, bzip2, lzma, and zstd algorithms, and the creation of ZIP- and tar-format
 archives.  See also :ref:`archiving-operations` provided by the :mod:`shutil`
 module.
 
 
 .. toctree::
 
+   compression.rst
+   compression.zstd.rst
    zlib.rst
    gzip.rst
    bz2.rst
diff --git a/Doc/library/compression.rst b/Doc/library/compression.rst
new file mode 100644
index 00000000000000..618b4a3c2bd170
--- /dev/null
+++ b/Doc/library/compression.rst
@@ -0,0 +1,18 @@
+The :mod:`!compression` package
+===============================
+
+.. versionadded:: 3.14
+
+The :mod:`!compression` package contains the canonical compression modules
+containing interfaces to several different compression algorithms. Some of
+these modules have historically been available as separate modules; those will
+continue to be available under their original names for compatibility reasons,
+and will not be removed without a deprecation cycle. The use of modules in
+:mod:`!compression` is encouraged where practical.
+
+* :mod:`!compression.bz2` -- Re-exports :mod:`bz2`
+* :mod:`!compression.gzip` -- Re-exports :mod:`gzip`
+* :mod:`!compression.lzma` -- Re-exports :mod:`lzma`
+* :mod:`!compression.zlib` -- Re-exports :mod:`zlib`
+* :mod:`compression.zstd` -- Wrapper for the Zstandard compression library
+
diff --git a/Doc/library/compression.zstd.rst b/Doc/library/compression.zstd.rst
new file mode 100644
index 00000000000000..1e1802155a19ec
--- /dev/null
+++ b/Doc/library/compression.zstd.rst
@@ -0,0 +1,840 @@
+:mod:`!compression.zstd` --- Compression compatible with the Zstandard format
+=============================================================================
+
+.. module:: compression.zstd
+   :synopsis: Low-level interface to compression and decompression routines in
+              the zstd library.
+
+.. versionadded:: 3.14
+
+**Source code:** :source:`Lib/compression/zstd/__init__.py`
+
+--------------
+
+This module provides classes and functions for compressing and decompressing
+data using the Zstandard (or *zstd*) compression algorithm. The
+`zstd manual <https://facebook.github.io/zstd/doc/api_manual_latest.html>`__
+describes Zstandard as "a fast lossless compression algorithm, targeting
+real-time compression scenarios at zlib-level and better compression ratios."
+Also included is a file interface that supports reading and writing the
+contents of ``.zst`` files created by the :program:`zstd` utility, as well as
+raw zstd compressed streams.
+
+The :mod:`!compression.zstd` module contains:
+
+* The :func:`.open` function and :class:`ZstdFile` class for reading and
+  writing compressed files.
+* The :class:`ZstdCompressor` and :class:`ZstdDecompressor` classes for
+  incremental (de)compression.
+* The :func:`compress` and :func:`decompress` functions for one-shot
+  (de)compression.
+* The :func:`train_dict` and :func:`finalize_dict` functions and the
+  :class:`ZstdDict` class to train and manage Zstandard dictionaries.
+* The :class:`CompressionParameter`, :class:`DecompressionParameter`, and
+  :class:`Strategy` classes for setting advanced (de)compression parameters.
+
+
+Exceptions
+----------
+
+.. exception:: ZstdError
+
+   This exception is raised when an error occurs during compression or
+   decompression, or while initializing the (de)compressor state.
+
+
+Reading and writing compressed files
+------------------------------------
+
+.. function:: open(file, /, mode='rb', *, level=None, options=None, \
+                   zstd_dict=None, encoding=None, errors=None, newline=None)
+
+   Open a Zstandard-compressed file in binary or text mode, returning a
+   :term:`file object`.
+
+   The *file* argument can be either a file name (given as a
+   :class:`str`, :class:`bytes` or :term:`path-like <path-like object>`
+   object), in which case the named file is opened, or it can be an existing
+   file object to read from or write to.
+
+   The mode argument can be either ``'rb'`` for reading (default), ``'wb'`` for
+   overwriting, ``'ab'`` for appending, or ``'xb'`` for exclusive creation.
+   These can equivalently be given as ``'r'``, ``'w'``, ``'a'``, and ``'x'``
+   respectively. You may also open in text mode with ``'rt'``, ``'wt'``,
+   ``'at'``, and ``'xt'`` respectively.
+
+   When reading, the *options* argument can be a dictionary providing advanced
+   decompression parameters; see :class:`DecompressionParameter` for detailed
+   information about supported
+   parameters. The *zstd_dict* argument is a :class:`ZstdDict` instance to be
+   used during decompression. When reading, if the *level*
+   argument is not None, a :exc:`!TypeError` will be raised.
+
+   When writing, the *options* argument can be a dictionary
+   providing advanced decompression parameters; see
+   :class:`CompressionParameter` for detailed information about supported
+   parameters. The *level* argument is the compression level to use when
+   writing compressed data. Only one of *level* or *options* may be non-None.
+   The *zstd_dict* argument is a :class:`ZstdDict` instance to be used during
+   compression.
+
+   In binary mode, this function is equivalent to the :class:`ZstdFile`
+   constructor: ``ZstdFile(file, mode, ...)``. In this case, the
+   *encoding*, *errors*, and *newline* parameters must not be provided.
+
+   In text mode, a :class:`ZstdFile` object is created, and wrapped in an
+   :class:`io.TextIOWrapper` instance with the specified encoding, error
+   handling behavior, and line endings.
+
+
+.. class:: ZstdFile(file, /, mode='rb', *, level=None, options=None, \
+                    zstd_dict=None)
+
+   Open a Zstandard-compressed file in binary mode.
+
+   A :class:`ZstdFile` can wrap an already-open :term:`file object`, or operate
+   directly on a named file. The *file* argument specifies either the file
+   object to wrap, or the name of the file to open (as a :class:`str`,
+   :class:`bytes` or :term:`path-like <path-like object>` object). If
+   wrapping an existing file object, the wrapped file will not be closed when
+   the :class:`ZstdFile` is closed.
+
+   The *mode* argument can be either ``'rb'`` for reading (default), ``'wb'``
+   for overwriting, ``'xb'`` for exclusive creation, or ``'ab'`` for appending.
+   These can equivalently be given as ``'r'``, ``'w'``, ``'x'`` and ``'a'``
+   respectively.
+
+   If *file* is a file object (rather than an actual file name), a mode of
+   ``'w'`` does not truncate the file, and is instead equivalent to ``'a'``.
+
+   When reading, the *options* argument can be a dictionary
+   providing advanced decompression parameters; see
+   :class:`DecompressionParameter` for detailed information about supported
+   parameters. The *zstd_dict* argument is a :class:`ZstdDict` instance to be
+   used during decompression. When reading, if the *level*
+   argument is not None, a :exc:`!TypeError` will be raised.
+
+   When writing, the *options* argument can be a dictionary
+   providing advanced decompression parameters; see
+   :class:`CompressionParameter` for detailed information about supported
+   parameters. The *level* argument is the compression level to use when
+   writing compressed data. Only one of *level* or *options* may be passed. The
+   *zstd_dict* argument is a :class:`ZstdDict` instance to be used during
+   compression.
+
+   :class:`!ZstdFile` supports all the members specified by
+   :class:`io.BufferedIOBase`, except for :meth:`~io.BufferedIOBase.detach`
+   and :meth:`~io.IOBase.truncate`.
+   Iteration and the :keyword:`with` statement are supported.
+
+   The following method and attributes are also provided:
+
+   .. method:: peek(size=-1)
+
+      Return buffered data without advancing the file position. At least one
+      byte of data will be returned, unless EOF has been reached. The exact
+      number of bytes returned is unspecified (the *size* argument is ignored).
+
+      .. note:: While calling :meth:`peek` does not change the file position of
+         the :class:`ZstdFile`, it may change the position of the underlying
+         file object (for example, if the :class:`ZstdFile` was constructed by
+         passing a file object for *file*).
+
+   .. attribute:: mode
+
+      ``'rb'`` for reading and ``'wb'`` for writing.
+
+   .. attribute:: name
+
+      The name of the Zstandard file. Equivalent to the :attr:`~io.FileIO.name`
+      attribute of the underlying :term:`file object`.
+
+
+Compressing and decompressing data in memory
+--------------------------------------------
+
+.. function:: compress(data, level=None, options=None, zstd_dict=None)
+
+   Compress *data* (a :term:`bytes-like object`), returning the compressed
+   data as a :class:`bytes` object.
+
+   The *level* argument is an integer controlling the level of
+   compression. *level* is an alternative to setting
+   :attr:`CompressionParameter.compression_level` in *options*. Use
+   :meth:`~CompressionParameter.bounds` on
+   :attr:`~CompressionParameter.compression_level` to get the values that can
+   be passed for *level*. If advanced compression options are needed, the
+   *level* argument must be omitted and in the *options* dictionary the
+   :attr:`!CompressionParameter.compression_level` parameter should be set.
+
+   The *options* argument is a Python dictionary containing advanced
+   compression parameters. The valid keys and values for compression parameters
+   are documented as part of the :class:`CompressionParameter` documentation.
+
+   The *zstd_dict* argument is an instance of :class:`ZstdDict`
+   containing trained data to improve compression efficiency. The
+   function :func:`train_dict` can be used to generate a Zstandard dictionary.
+
+
+.. function:: decompress(data, zstd_dict=None, options=None)
+
+   Decompress *data* (a :term:`bytes-like object`), returning the uncompressed
+   data as a :class:`bytes` object.
+
+   The *options* argument is a Python dictionary containing advanced
+   decompression parameters. The valid keys and values for compression
+   parameters are documented as part of the :class:`DecompressionParameter`
+   documentation.
+
+   The *zstd_dict* argument is an instance of :class:`ZstdDict`
+   containing trained data used during compression. This must be
+   the same Zstandard dictionary used during compression.
+
+   If *data* is the concatenation of multiple distinct compressed frames,
+   decompress all of these frames, and return the concatenation of the results.
+
+
+.. class:: ZstdCompressor(level=None, options=None, zstd_dict=None)
+
+   Create a compressor object, which can be used to compress data
+   incrementally.
+
+   For a more convenient way of compressing a single chunk of data, see the
+   module-level function :func:`compress`.
+
+   The *level* argument is an integer controlling the level of
+   compression. *level* is an alternative to setting
+   :attr:`CompressionParameter.compression_level` in *options*. Use
+   :meth:`~CompressionParameter.bounds` on
+   :attr:`~CompressionParameter.compression_level` to get the values that can
+   be passed for *level*. If advanced compression options are needed, the
+   *level* argument must be omitted and in the *options* dictionary the
+   :attr:`!CompressionParameter.compression_level` parameter should be set.
+
+   The *options* argument is a Python dictionary containing advanced
+   compression parameters. The valid keys and values for compression parameters
+   are documented as part of the :class:`CompressionParameter` documentation.
+
+   The *zstd_dict* argument is an optional instance of :class:`ZstdDict`
+   containing trained data to improve compression efficiency. The
+   function :func:`train_dict` can be used to generate a Zstandard dictionary.
+
+
+   .. method:: compress(data, mode=ZstdCompressor.CONTINUE)
+
+      Compress *data* (a :term:`bytes-like object`), returning a :class:`bytes`
+      object with compressed data if possible, or otherwise an empty
+      :class:`!bytes` object. Some of *data* may be buffered internally, for
+      use in later calls to :meth:`!compress` and :meth:`~.flush`. The returned
+      data should be concatenated with the output of any previous calls to
+      :meth:`~.compress`.
+
+      The *mode* argument is a :class:`ZstdCompressor` attribute, either
+      :attr:`~.CONTINUE`, :attr:`~.FLUSH_BLOCK`,
+      or :attr:`~.FLUSH_FRAME`.
+
+      When all data has been provided to the compressor, call the
+      :meth:`~.flush` method to finish the compression process. If
+      :meth:`~.compress` is called with *mode* set to :attr:`~.FLUSH_FRAME`,
+      :meth:`~.flush` should not be called, as it would write out a new empty
+      frame.
+
+   .. method:: flush(mode=ZstdCompressor.FLUSH_FRAME)
+
+      Finish the compression process, returning a :class:`bytes` object
+      containing any data stored in the compressor's internal buffers.
+
+      The *mode* argument is a :class:`ZstdCompressor` attribute, either
+      :attr:`~.FLUSH_BLOCK`, or :attr:`~.FLUSH_FRAME`.
+
+   .. attribute:: CONTINUE
+
+      Collect more data for compression, which may or may not generate output
+      immediately. This mode optimizes the compression ratio by maximizing the
+      amount of data per block and frame.
+
+   .. attribute:: FLUSH_BLOCK
+
+      Complete and write a block to the data stream. The data returned so far
+      can be immediately decompressed. Past data can still be referenced in
+      future blocks generated by calls to :meth:`~.compress`,
+      improving compression.
+
+   .. attribute:: FLUSH_FRAME
+
+      Complete and write out a frame. Future data provided to
+      :meth:`~.compress` will be written into a new frame and
+      *cannot* reference past data.
+
+
+.. class:: ZstdDecompressor(zstd_dict=None, options=None)
+
+   Create a decompressor object, which can be used to decompress data
+   incrementally.
+
+   For a more convenient way of decompressing an entire compressed stream at
+   once, see the module-level function :func:`decompress`.
+
+   The *options* argument is a Python dictionary containing advanced
+   decompression parameters. The valid keys and values for compression
+   parameters are documented as part of the :class:`DecompressionParameter`
+   documentation.
+
+   The *zstd_dict* argument is an instance of :class:`ZstdDict`
+   containing trained data used during compression. This must be
+   the same Zstandard dictionary used during compression.
+
+   .. note::
+      This class does not transparently handle inputs containing multiple
+      compressed frames, unlike the :func:`decompress` function and
+      :class:`ZstdFile` class. To decompress a multi-frame input, you should
+      use :func:`decompress`, :class:`ZstdFile` if working with a
+      :term:`file object`, or multiple :class:`!ZstdDecompressor` instances.
+
+   .. method:: decompress(data, max_length=-1)
+
+      Decompress *data* (a :term:`bytes-like object`), returning
+      uncompressed data as bytes. Some of *data* may be buffered
+      internally, for use in later calls to :meth:`!decompress`.
+      The returned data should be concatenated with the output of any previous
+      calls to :meth:`!decompress`.
+
+      If *max_length* is non-negative, the method returns at most *max_length*
+      bytes of decompressed data. If this limit is reached and further
+      output can be produced, the :attr:`~.needs_input` attribute will
+      be set to ``False``. In this case, the next call to
+      :meth:`~.decompress` may provide *data* as ``b''`` to obtain
+      more of the output.
+
+      If all of the input data was decompressed and returned (either
+      because this was less than *max_length* bytes, or because
+      *max_length* was negative), the :attr:`~.needs_input` attribute
+      will be set to ``True``.
+
+      Attempting to decompress data after the end of a frame will raise a
+      :exc:`ZstdError`. Any data found after the end of the frame is ignored
+      and saved in the :attr:`~.unused_data` attribute.
+
+   .. attribute:: eof
+
+      ``True`` if the end-of-stream marker has been reached.
+
+   .. attribute:: unused_data
+
+      Data found after the end of the compressed stream.
+
+      Before the end of the stream is reached, this will be ``b''``.
+
+   .. attribute:: needs_input
+
+      ``False`` if the :meth:`.decompress` method can provide more
+      decompressed data before requiring new compressed input.
+
+
+Zstandard dictionaries
+----------------------
+
+
+.. function:: train_dict(samples, dict_size)
+
+   Train a Zstandard dictionary, returning a :class:`ZstdDict` instance.
+   Zstandard dictionaries enable more efficient compression of smaller sizes
+   of data, which is traditionally difficult to compress due to less
+   repetition. If you are compressing multiple similar groups of data (such as
+   similar files), Zstandard dictionaries can improve compression ratios and
+   speed significantly.
+
+   The *samples* argument (an iterable of :class:`bytes` objects), is the
+   population of samples used to train the Zstandard dictionary.
+
+   The *dict_size* argument, an integer, is the maximum size (in bytes) the
+   Zstandard dictionary should be. The Zstandard documentation suggests an
+   absolute maximum of no more than 100 KB, but the maximum can often be 
smaller
+   depending on the data. Larger dictionaries generally slow down compression,
+   but improve compression ratios. Smaller dictionaries lead to faster
+   compression, but reduce the compression ratio.
+
+
+.. function:: finalize_dict(zstd_dict, /, samples, dict_size, level)
+
+   An advanced function for converting a "raw content" Zstandard dictionary 
into
+   a regular Zstandard dictionary. "Raw content" dictionaries are a sequence of
+   bytes that do not need to follow the structure of a normal Zstandard
+   dictionary.
+
+   The *zstd_dict* argument is a :class:`ZstdDict` instance with
+   the :attr:`~ZstdDict.dict_content` containing the raw dictionary contents.
+
+   The *samples* argument (an iterable of :class:`bytes` objects), contains
+   sample data for generating the Zstandard dictionary.
+
+   The *dict_size* argument, an integer, is the maximum size (in bytes) the
+   Zstandard dictionary should be. See :func:`train_dict` for
+   suggestions on the maximum dictionary size.
+
+   The *level* argument (an integer) is the compression level expected to be
+   passed to the compressors using this dictionary. The dictionary information
+   varies for each compression level, so tuning for the proper compression
+   level can make compression more efficient.
+
+
+.. class:: ZstdDict(dict_content, /, *, is_raw=False)
+
+   A wrapper around Zstandard dictionaries. Dictionaries can be used to improve
+   the compression of many small chunks of data. Use :func:`train_dict` if you
+   need to train a new dictionary from sample data.
+
+   The *dict_content* argument (a :term:`bytes-like object`), is the already
+   trained dictionary information.
+
+   The *is_raw* argument, a boolean, is an advanced parameter controlling the
+   meaning of *dict_content*. ``True`` means *dict_content* is a "raw content"
+   dictionary, without any format restrictions. ``False`` means *dict_content*
+   is an ordinary Zstandard dictionary, created from Zstandard functions,
+   for example, :func:`train_dict` or the external :program:`zstd` CLI.
+
+   When passing a :class:`!ZstdDict` to a function, the
+   :attr:`!as_digested_dict` and :attr:`!as_undigested_dict` attributes can
+   control how the dictionary is loaded by passing them as the ``zstd_dict``
+   argument, for example, ``compress(data, zstd_dict=zd.as_digested_dict)``.
+   Digesting a dictionary is a costly operation that occurs when loading a
+   Zstandard dictionary. When making multiple calls to compression or
+   decompression, passing a digested dictionary will reduce the overhead of
+   loading the dictionary.
+
+    .. list-table:: Difference for compression
+       :widths: 10 14 10
+       :header-rows: 1
+
+       * -
+         - Digested dictionary
+         - Undigested dictionary
+       * - Advanced parameters of the compressor which may be overridden by
+           the dictionary's parameters
+         - ``window_log``, ``hash_log``, ``chain_log``, ``search_log``,
+           ``min_match``, ``target_length``, ``strategy``,
+           ``enable_long_distance_matching``, ``ldm_hash_log``,
+           ``ldm_min_match``, ``ldm_bucket_size_log``, ``ldm_hash_rate_log``,
+           and some non-public parameters.
+         - None
+       * - :class:`!ZstdDict` internally caches the dictionary
+         - Yes. It's faster when loading a digested dictionary again with the
+           same compression level.
+         - No. If you wish to load an undigested dictionary multiple times,
+           consider reusing a compressor object.
+
+   If passing a :class:`!ZstdDict` without any attribute, an undigested
+   dictionary is passed by default when compressing and a digested dictionary
+   is generated if necessary and passed by default when decompressing.
+
+    .. attribute:: dict_content
+
+        The content of the Zstandard dictionary, a ``bytes`` object. It's the
+        same as the *dict_content* argument in the ``__init__`` method. It can
+        be used with other programs, such as the ``zstd`` CLI program.
+
+    .. attribute:: dict_id
+
+        Identifier of the Zstandard dictionary, a non-negative int value.
+
+        Non-zero means the dictionary is ordinary, created by Zstandard
+        functions and following the Zstandard format.
+
+        ``0`` means a "raw content" dictionary, free of any format restriction,
+        used for advanced users.
+
+        .. note::
+
+            The meaning of ``0`` for :attr:`!ZstdDict.dict_id` is different
+            from the ``dictionary_id`` attribute to the :func:`get_frame_info`
+            function.
+
+    .. attribute:: as_digested_dict
+
+        Load as a digested dictionary.
+
+    .. attribute:: as_undigested_dict
+
+        Load as an undigested dictionary.
+
+
+Advanced parameter control
+--------------------------
+
+.. class:: CompressionParameter()
+
+   An :class:`~enum.IntEnum` containing the advanced compression parameter
+   keys that can be used when compressing data.
+
+   The :meth:`~.bounds` method can be used on any attribute to get the valid
+   values for that parameter.
+
+   Parameters are optional; any omitted parameter will have it's value selected
+   automatically.
+
+   Example getting the lower and upper bound of :attr:`~.compression_level`::
+
+      lower, upper = CompressionParameter.compression_level.bounds()
+
+   Example setting the :attr:`~.window_log` to the maximum size::
+
+      _lower, upper = CompressionParameter.window_log.bounds()
+      options = {CompressionParameter.window_log: upper}
+      compress(b'venezuelan beaver cheese', options=options)
+
+   .. method:: bounds()
+
+      Return the tuple of int bounds, ``(lower, upper)``, of a compression
+      parameter. This method should be called on the attribute you wish to
+      retrieve the bounds of. For example, to get the valid values for
+      :attr:`~.compression_level`, one may check the result of
+      ``CompressionParameter.compression_level.bounds()``.
+
+      Both the lower and upper bounds are inclusive.
+
+   .. attribute:: compression_level
+
+      A high-level means of setting other compression parameters that affect
+      the speed and ratio of compressing data. Setting the level to zero uses
+      :attr:`COMPRESSION_LEVEL_DEFAULT`.
+
+   .. attribute:: window_log
+
+      Maximum allowed back-reference distance the compressor can use when
+      compressing data, expressed as power of two, ``1 << window_log`` bytes.
+      This parameter greatly influences the memory usage of compression. Higher
+      values require more memory but gain better compression values.
+
+      A value of zero causes the value to be selected automatically.
+
+   .. attribute:: hash_log
+
+      Size of the initial probe table, as a power of two. The resulting memory
+      usage is ``1 << (hash_log+2)`` bytes. Larger tables improve compression
+      ratio of strategies <= :attr:`~Strategy.dfast`, and improve compression
+      speed of strategies > :attr:`~Strategy.dfast`.
+
+      A value of zero causes the value to be selected automatically.
+
+   .. attribute:: chain_log
+
+      Size of the multi-probe search table, as a power of two. The resulting
+      memory usage is ``1 << (chain_log+2)`` bytes. Larger tables result in
+      better and slower compression. This parameter has no effect for the
+      :attr:`~Strategy.fast` strategy. It's still useful when using
+      :attr:`~Strategy.dfast` strategy, in which case it defines a secondary
+      probe table.
+
+      A value of zero causes the value to be selected automatically.
+
+   .. attribute:: search_log
+
+      Number of search attempts, as a power of two. More attempts result in
+      better and slower compression. This parameter is useless for
+      :attr:`~Strategy.fast` and :attr:`~Strategy.dfast` strategies.
+
+      A value of zero causes the value to be selected automatically.
+
+   .. attribute:: min_match
+
+      Minimum size of searched matches. Larger values increase compression and
+      decompression speed, but decrease ratio. Note that Zstandard can still
+      find matches of smaller size, it just tweaks its search algorithm to look
+      for this size and larger. For all strategies < :attr:`~Strategy.btopt`,
+      the effective minimum is ``4``; for all strategies
+      > :attr:`~Strategy.fast`, the effective maximum is ``6``.
+
+      A value of zero causes the value to be selected automatically.
+
+   .. attribute:: target_length
+
+      The impact of this field depends on the selected :class:`Strategy`.
+
+      For strategies :attr:`~Strategy.btopt`, :attr:`~Strategy.btultra` and
+      :attr:`~Strategy.btultra2`, the value is the length of a match
+      considered "good enough" to stop searching. Larger values make
+      compression ratios better, but compresses slower.
+
+      For strategy :attr:`~Strategy.fast`, it is the distance between match
+      sampling. Larger values make compression faster, but with a worse
+      compression ratio.
+
+      A value of zero causes the value to be selected automatically.
+
+   .. attribute:: strategy
+
+      The higher the value of selected strategy, the more complex the
+      compression technique used by zstd, resulting in higher compression
+      ratios but slower compression.
+
+      .. seealso:: :class:`Strategy`
+
+   .. attribute:: enable_long_distance_matching
+
+      Long distance matching can be used to improve compression for large
+      inputs by finding large matches at greater distances. It increases memory
+      usage and window size.
+
+      ``True`` or ``1`` enable long distance matching while ``False`` or ``0``
+      disable it.
+
+      Enabling this parameter increases default
+      :attr:`~CompressionParameter.window_log` to 128 MiB except when expressly
+      set to a different value. This setting is enabled by default if
+      :attr:`!window_log` >= 128 MiB and the compression
+      strategy >= :attr:`~Strategy.btopt` (compression level 16+).
+
+   .. attribute:: ldm_hash_log
+
+      Size of the table for long distance matching, as a power of two. Larger
+      values increase memory usage and compression ratio, but decrease
+      compression speed.
+
+      A value of zero causes the value to be selected automatically.
+
+   .. attribute:: ldm_min_match
+
+      Minimum match size for long distance matcher. Larger or too small values
+      can often decrease the compression ratio.
+
+      A value of zero causes the value to be selected automatically.
+
+   .. attribute:: ldm_bucket_size_log
+
+      Log size of each bucket in the long distance matcher hash table for
+      collision resolution. Larger values improve collision resolution but
+      decrease compression speed.
+
+      A value of zero causes the value to be selected automatically.
+
+   .. attribute:: ldm_hash_rate_log
+
+      Frequency of inserting/looking up entries into the long distance matcher
+      hash table. Larger values improve compression speed. Deviating far from
+      the default value will likely result in a compression ratio decrease.
+
+      A value of zero causes the value to be selected automatically.
+
+   .. attribute:: checksum_flag
+
+      A four-byte checksum using XXHash64 of the uncompressed content is
+      written at the end of each frame. Zstandard's decompression code verifies
+      the checksum. If there is a mismatch a :class:`ZstdError` exception is
+      raised.
+
+      ``True`` or ``1`` enable checksum generation while ``False`` or ``0``
+      disable it.
+
+   .. attribute:: dict_id_flag
+
+      When compressing with a :class:`ZstdDict`, the dictionary's ID is written
+      into the frame header.
+
+      ``True`` or ``1`` enable storing the dictionary ID while ``False`` or
+      ``0`` disable it.
+
+   .. attribute:: nb_workers
+
+      Select how many threads will be spawned to compress in parallel. When
+      :attr:`!nb_workers` > 0, enables multi-threaded compression, a value of
+      ``1`` means "one-thread multi-threaded mode". More workers improve speed,
+      but also increase memory usage and slightly reduce compression ratio.
+
+      A value of zero disables multi-threading.
+
+   .. attribute:: job_size
+
+      Size of a compression job, in bytes. This value is enforced only when
+      :attr:`~CompressionParameter.nb_workers` >= 1. Each compression job is
+      completed in parallel, so this value can indirectly impact the number of
+      active threads.
+
+      A value of zero causes the value to be selected automatically.
+
+   .. attribute:: overlap_log
+
+      Sets how much data is reloaded from previous jobs (threads) for new jobs
+      to be used by the look behind window during compression. This value is
+      only used when :attr:`~CompressionParameter.nb_workers` >= 1. Acceptable
+      values vary from 0 to 9.
+
+         * 0 means dynamically set the overlap amount
+         * 1 means no overlap
+         * 9 means use a full window size from the previous job
+
+      Each increment halves/doubles the overlap size. "8" means an overlap of
+      ``window_size/2``, "7" means an overlap of ``window_size/4``, etc.
+
+.. class:: DecompressionParameter()
+
+   An :class:`~enum.IntEnum` containing the advanced decompression parameter
+   keys that can be used when decompressing data. Parameters are optional; any
+   omitted parameter will have it's value selected automatically.
+
+   The :meth:`~.bounds` method can be used on any attribute to get the valid
+   values for that parameter.
+
+   Example setting the :attr:`~.window_log_max` to the maximum size::
+
+      data = compress(b'Some very long buffer of bytes...')
+
+      _lower, upper = DecompressionParameter.window_log_max.bounds()
+
+      options = {DecompressionParameter.window_log_max: upper}
+      decompress(data, options=options)
+
+   .. method:: bounds()
+
+      Return the tuple of int bounds, ``(lower, upper)``, of a decompression
+      parameter. This method should be called on the attribute you wish to
+      retrieve the bounds of.
+
+      Both the lower and upper bounds are inclusive.
+
+   .. attribute:: window_log_max
+
+      The base-two logarithm of the maximum size of the window used during
+      decompression. This can be useful to limit the amount of memory used when
+      decompressing data. A larger maximum window size leads to faster
+      decompression.
+
+      A value of zero causes the value to be selected automatically.
+
+
+.. class:: Strategy()
+
+   An :class:`~enum.IntEnum` containing strategies for compression.
+   Higher-numbered strategies correspond to more complex and slower
+   compression.
+
+   .. note::
+
+      The values of attributes of :class:`!Strategy` are not necessarily stable
+      across zstd versions. Only the ordering of the attributes may be relied
+      upon. The attributes are listed below in order.
+
+   The following strategies are available:
+
+   .. attribute:: fast
+
+   .. attribute:: dfast
+
+   .. attribute:: greedy
+
+   .. attribute:: lazy
+
+   .. attribute:: lazy2
+
+   .. attribute:: btlazy2
+
+   .. attribute:: btopt
+
+   .. attribute:: btultra
+
+   .. attribute:: btultra2
+
+
+Miscellaneous
+-------------
+
+.. function:: get_frame_info(frame_buffer)
+
+   Retrieve a :class:`FrameInfo` object containing metadata about a Zstandard
+   frame. Frames contain metadata related to the compressed data they hold.
+
+
+.. class:: FrameInfo
+
+   Metadata related to a Zstandard frame.
+
+   .. attribute:: decompressed_size
+
+      The size of the decompressed contents of the frame.
+
+   .. attribute:: dictionary_id
+
+      An integer representing the Zstandard dictionary ID needed for
+      decompressing the frame. ``0`` means the dictionary ID was not
+      recorded in the frame header. This may mean that a Zstandard dictionary
+      is not needed, or that the ID of a required dictionary was not recorded.
+
+
+.. attribute:: COMPRESSION_LEVEL_DEFAULT
+
+   The default compression level for Zstandard: ``3``.
+
+
+.. attribute:: zstd_version_info
+
+   Version number of the runtime zstd library as a tuple of integers
+   (major, minor, release).
+
+
+Examples
+--------
+
+Reading in a compressed file:
+
+.. code-block:: python
+
+   from compression import zstd
+
+   with zstd.open("file.zst") as f:
+       file_content = f.read()
+
+Creating a compressed file:
+
+.. code-block:: python
+
+   from compression import zstd
+
+   data = b"Insert Data Here"
+   with zstd.open("file.zst", "w") as f:
+       f.write(data)
+
+Compressing data in memory:
+
+.. code-block:: python
+
+   from compression import zstd
+
+   data_in = b"Insert Data Here"
+   data_out = zstd.compress(data_in)
+
+Incremental compression:
+
+.. code-block:: python
+
+   from compression import zstd
+
+   comp = zstd.ZstdCompressor()
+   out1 = comp.compress(b"Some data\n")
+   out2 = comp.compress(b"Another piece of data\n")
+   out3 = comp.compress(b"Even more data\n")
+   out4 = comp.flush()
+   # Concatenate all the partial results:
+   result = b"".join([out1, out2, out3, out4])
+
+Writing compressed data to an already-open file:
+
+.. code-block:: python
+
+   from compression import zstd
+
+   with open("myfile", "wb") as f:
+       f.write(b"This data will not be compressed\n")
+       with zstd.open(f, "w") as zstf:
+           zstf.write(b"This *will* be compressed\n")
+       f.write(b"Not compressed\n")
+
+Creating a compressed file using compression parameters:
+
+.. code-block:: python
+
+   from compression import zstd
+
+   options = {
+      zstd.CompressionParameter.checksum_flag: 1
+   }
+   with zstd.open("file.zst", "w", options=options) as f:
+       f.write(b"Mind if I squeeze in?")

_______________________________________________
Python-checkins mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/python-checkins.python.org
Member address: [email protected]

[Python-checkins] gh-132983: Add documentation for compression.zstd (GH-133911)

Reply via email to