Bug#1026344: insilicoseq: autopkgtest needs update for new version of numpy: EOFError

Paul Gevers Sun, 18 Dec 2022 12:18:25 -0800

Source: insilicoseq
Version: 1.5.4-3
Severity: serious
X-Debbugs-CC: nu...@packages.debian.org
Tags: sid bookworm
User: debian...@lists.debian.org
Usertags: needs-update
Control: affects -1 src:numpy


Dear maintainer(s),

With a recent upload of numpy the autopkgtest of insilicoseq fails in testing when that autopkgtest is run with the binary packages of numpy from unstable. It passes when run with only packages from testing. In tabular form:


                       pass            fail
numpy                  from testing    1:1.23.5-2
insilicoseq            from testing    1.5.4-3
all others             from testing    from testing

I copied some of the output at the bottom of this report.

Currently this regression is blocking the migration of numpy to testing [1]. Of course, numpy shouldn't just break your autopkgtest (or even worse, your package), but it seems to me that the change in numpy was intended and your package needs to update to the new situation.

If this is a real problem in your package (and not only in your autopkgtest), the right binary package(s) from numpy should really add a versioned Breaks on the unfixed version of (one of your) package(s). Note: the Breaks is nice even if the issue is only in the autopkgtest as it helps the migration software to figure out the right versions to combine in the tests.


More information about this bug and the reason for filing it can be found on
https://wiki.debian.org/ContinuousIntegration/RegressionEmailInformation

Paul

[1] https://qa.debian.org/excuses.php?package=numpy

https://ci.debian.net/data/autopkgtest/testing/amd64/i/insilicoseq/29465792/log.gz

=================================== FAILURES =================================== _______________________________ test_bad_err_mod _______________________________


file = 'data/empty_file', mmap_mode = None, allow_pickle = True
fix_imports = True, encoding = 'ASCII'

    @set_module('numpy')
    def load(file, mmap_mode=None, allow_pickle=False, fix_imports=True,
             encoding='ASCII', *, max_header_size=format._MAX_HEADER_SIZE):
        """

Load arrays or pickled objects from ``.npy``, ``.npz`` or pickled files. .. warning:: Loading files that contain object arrays uses the ``pickle`` module, which is not secure against erroneous or maliciously constructed data. Consider passing ``allow_pickle=False`` to load data that is known not to contain object arrays for the

                     safer handling of untrusted sources.
            Parameters
        ----------
        file : file-like object, string, or pathlib.Path
            The file to read. File-like objects must support the
            ``seek()`` and ``read()`` methods and must always
            be opened in binary mode.  Pickled files require that the
            file-like object support the ``readline()`` method as well.
        mmap_mode : {None, 'r+', 'r', 'w+', 'c'}, optional

If not None, then memory-map the file, using the given mode (see

            `numpy.memmap` for a detailed description of the modes).  A

memory-mapped array is kept on disk. However, it can be accessed and sliced like any ndarray. Memory mapping is especially useful for accessing small fragments of large files without reading the

            entire file into memory.
        allow_pickle : bool, optional

Allow loading pickled object arrays stored in npy files. Reasons for disallowing pickles include security, as loading pickled data can execute arbitrary code. If pickles are disallowed, loading object

            arrays will fail. Default: False
                .. versionchanged:: 1.16.3
                Made default False in response to CVE-2019-6446.
            fix_imports : bool, optional

Only useful when loading Python 2 generated pickled files on Python 3, which includes npy/npz files containing object arrays. If `fix_imports` is True, pickle will try to map the old Python 2 names to the new names

            used in Python 3.
        encoding : str, optional

What encoding to use when reading Python 2 strings. Only useful when loading Python 2 generated pickled files in Python 3, which includes npy/npz files containing object arrays. Values other than 'latin1', 'ASCII', and 'bytes' are not allowed, as they can corrupt numerical

            data. Default: 'ASCII'
        max_header_size : int, optional

Maximum allowed size of the header. Large headers may not be safe to load securely and thus require explicitly passing a larger value.

            See :py:meth:`ast.literal_eval()` for details.

This option is ignored when `allow_pickle` is passed. In that case

            the file is by definition trusted and the limit is unnecessary.
            Returns
        -------
        result : array, tuple, dict, etc.

Data stored in the file. For ``.npz`` files, the returned instance of NpzFile class must be closed to avoid leaking file descriptors.

            Raises
        ------
        OSError
            If the input file does not exist or cannot be read.
        UnpicklingError

If ``allow_pickle=True``, but the file cannot be loaded as a pickle.

        ValueError

The file contains an object array, but ``allow_pickle=False`` given.

            See Also
        --------
        save, savez, savez_compressed, loadtxt
        memmap : Create a memory-map to an array stored in a file on disk.

lib.format.open_memmap : Create or load a memory-mapped ``.npy`` file.

            Notes
        -----
        - If the file contains pickle data, then whatever object is stored
          in the pickle is returned.
        - If the file is a ``.npy`` file, then a single array is returned.
        - If the file is a ``.npz`` file, then a dictionary-like object is

returned, containing ``{filename: array}`` key-value pairs, one for

          each file in the archive.
        - If the file is a ``.npz`` file, the returned value supports the

context manager protocol in a similar fashion to the open function::

                with load('foo.npz') as data:
                a = data['a']

The underlying file descriptor is closed when exiting the 'with'

          block.
            Examples
        --------
        Store data to disk, and load it again:
            >>> np.save('/tmp/123', np.array([[1, 2, 3], [4, 5, 6]]))
        >>> np.load('/tmp/123.npy')
        array([[1, 2, 3],
               [4, 5, 6]])
            Store compressed data to disk, and load it again:
            >>> a=np.array([[1, 2, 3], [4, 5, 6]])
        >>> b=np.array([1, 2])
        >>> np.savez('/tmp/123.npz', a=a, b=b)
        >>> data = np.load('/tmp/123.npz')
        >>> data['a']
        array([[1, 2, 3],
               [4, 5, 6]])
        >>> data['b']
        array([1, 2])
        >>> data.close()
            Mem-map the stored array, and then access the second row
        directly from disk:
            >>> X = np.load('/tmp/123.npy', mmap_mode='r')
        >>> X[1, :]
        memmap([4, 5, 6])
            """
        if encoding not in ('ASCII', 'latin1', 'bytes'):
            # The 'encoding' value for pickle also affects what encoding
            # the serialized binary data of NumPy arrays is loaded
            # in. Pickle does not pass on the encoding information to
            # NumPy. The unpickling code in numpy.core.multiarray is
            # written to assume that unicode data appearing where binary

# should be is in 'latin1'. 'bytes' is also safe, as is 'ASCII'.

            #
            # Other encoding values can corrupt binary data, and we
            # purposefully disallow them. For the same reason, the errors=
            # argument is not exposed, as values other than 'strict'
            # result can similarly silently corrupt numerical data.

raise ValueError("encoding must be 'ASCII', 'latin1', or 'bytes'") pickle_kwargs = dict(encoding=encoding, fix_imports=fix_imports)

            with contextlib.ExitStack() as stack:
            if hasattr(file, 'read'):
                fid = file
                own_fid = False
            else:
                fid = stack.enter_context(open(os_fspath(file), "rb"))
                own_fid = True
                # Code to distinguish from NumPy binary files and pickles.
            _ZIP_PREFIX = b'PK\x03\x04'
            _ZIP_SUFFIX = b'PK\x05\x06' # empty zip files start with this
            N = len(format.MAGIC_PREFIX)
            magic = fid.read(N)
            # If the file size is less than N, we need to make sure not
            # to seek past the beginning of the file
            fid.seek(-min(N, len(magic)), 1)  # back-up

if magic.startswith(_ZIP_PREFIX) or magic.startswith(_ZIP_SUFFIX):

                # zip-file (assume .npz)
                # Potentially transfer file ownership to NpzFile
                stack.pop_all()

ret = NpzFile(fid, own_fid=own_fid, allow_pickle=allow_pickle,

                              pickle_kwargs=pickle_kwargs,
                              max_header_size=max_header_size)
                return ret
            elif magic == format.MAGIC_PREFIX:
                # .npy file
                if mmap_mode:
                    if allow_pickle:
                        max_header_size = 2**64
                    return format.open_memmap(file, mode=mmap_mode,

max_header_size=max_header_size)
                else:

return format.read_array(fid, allow_pickle=allow_pickle,

                                             pickle_kwargs=pickle_kwargs,

max_header_size=max_header_size)
            else:
                # Try a pickle
                if not allow_pickle:

raise ValueError("Cannot load file containing pickled data "

                                     "when allow_pickle=False")
                try:

                  return pickle.load(fid, **pickle_kwargs)

E                   EOFError: Ran out of input

/usr/lib/python3/dist-packages/numpy/lib/npyio.py:441: EOFError

The above exception was the direct cause of the following exception:

    def test_bad_err_mod():
        with pytest.raises(SystemExit):

          err_mod = kde.KDErrorModel('data/empty_file')

test/test_error_model.py:93: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ /usr/lib/python3/dist-packages/iss/error_models/kde.py:34: in __init__

    self.error_profile = self.load_npz(npz_path, 'kde')
/usr/lib/python3/dist-packages/iss/error_models/__init__.py:37: in load_npz
    error_profile = np.load(npz_path, allow_pickle=True)

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

file = 'data/empty_file', mmap_mode = None, allow_pickle = True
fix_imports = True, encoding = 'ASCII'

    @set_module('numpy')
    def load(file, mmap_mode=None, allow_pickle=False, fix_imports=True,
             encoding='ASCII', *, max_header_size=format._MAX_HEADER_SIZE):
        """

                     safer handling of untrusted sources.
            Parameters
        ----------
        file : file-like object, string, or pathlib.Path
            The file to read. File-like objects must support the
            ``seek()`` and ``read()`` methods and must always
            be opened in binary mode.  Pickled files require that the
            file-like object support the ``readline()`` method as well.
        mmap_mode : {None, 'r+', 'r', 'w+', 'c'}, optional

If not None, then memory-map the file, using the given mode (see

            `numpy.memmap` for a detailed description of the modes).  A

memory-mapped array is kept on disk. However, it can be accessed and sliced like any ndarray. Memory mapping is especially useful for accessing small fragments of large files without reading the

            entire file into memory.
        allow_pickle : bool, optional

            arrays will fail. Default: False
                .. versionchanged:: 1.16.3
                Made default False in response to CVE-2019-6446.
            fix_imports : bool, optional

            used in Python 3.
        encoding : str, optional

            data. Default: 'ASCII'
        max_header_size : int, optional

Maximum allowed size of the header. Large headers may not be safe to load securely and thus require explicitly passing a larger value.

            See :py:meth:`ast.literal_eval()` for details.

This option is ignored when `allow_pickle` is passed. In that case

            the file is by definition trusted and the limit is unnecessary.
            Returns
        -------
        result : array, tuple, dict, etc.

Data stored in the file. For ``.npz`` files, the returned instance of NpzFile class must be closed to avoid leaking file descriptors.

            Raises
        ------
        OSError
            If the input file does not exist or cannot be read.
        UnpicklingError

If ``allow_pickle=True``, but the file cannot be loaded as a pickle.

        ValueError

The file contains an object array, but ``allow_pickle=False`` given.

            See Also
        --------
        save, savez, savez_compressed, loadtxt
        memmap : Create a memory-map to an array stored in a file on disk.

lib.format.open_memmap : Create or load a memory-mapped ``.npy`` file.

            Notes
        -----
        - If the file contains pickle data, then whatever object is stored
          in the pickle is returned.
        - If the file is a ``.npy`` file, then a single array is returned.
        - If the file is a ``.npz`` file, then a dictionary-like object is

returned, containing ``{filename: array}`` key-value pairs, one for

          each file in the archive.
        - If the file is a ``.npz`` file, the returned value supports the

context manager protocol in a similar fashion to the open function::

                with load('foo.npz') as data:
                a = data['a']

The underlying file descriptor is closed when exiting the 'with'

          block.
            Examples
        --------
        Store data to disk, and load it again:
            >>> np.save('/tmp/123', np.array([[1, 2, 3], [4, 5, 6]]))
        >>> np.load('/tmp/123.npy')
        array([[1, 2, 3],
               [4, 5, 6]])
            Store compressed data to disk, and load it again:
            >>> a=np.array([[1, 2, 3], [4, 5, 6]])
        >>> b=np.array([1, 2])
        >>> np.savez('/tmp/123.npz', a=a, b=b)
        >>> data = np.load('/tmp/123.npz')
        >>> data['a']
        array([[1, 2, 3],
               [4, 5, 6]])
        >>> data['b']
        array([1, 2])
        >>> data.close()
            Mem-map the stored array, and then access the second row
        directly from disk:
            >>> X = np.load('/tmp/123.npy', mmap_mode='r')
        >>> X[1, :]
        memmap([4, 5, 6])
            """
        if encoding not in ('ASCII', 'latin1', 'bytes'):
            # The 'encoding' value for pickle also affects what encoding
            # the serialized binary data of NumPy arrays is loaded
            # in. Pickle does not pass on the encoding information to
            # NumPy. The unpickling code in numpy.core.multiarray is
            # written to assume that unicode data appearing where binary

# should be is in 'latin1'. 'bytes' is also safe, as is 'ASCII'.

            #
            # Other encoding values can corrupt binary data, and we
            # purposefully disallow them. For the same reason, the errors=
            # argument is not exposed, as values other than 'strict'
            # result can similarly silently corrupt numerical data.

raise ValueError("encoding must be 'ASCII', 'latin1', or 'bytes'") pickle_kwargs = dict(encoding=encoding, fix_imports=fix_imports)

            with contextlib.ExitStack() as stack:
            if hasattr(file, 'read'):
                fid = file
                own_fid = False
            else:
                fid = stack.enter_context(open(os_fspath(file), "rb"))
                own_fid = True
                # Code to distinguish from NumPy binary files and pickles.
            _ZIP_PREFIX = b'PK\x03\x04'
            _ZIP_SUFFIX = b'PK\x05\x06' # empty zip files start with this
            N = len(format.MAGIC_PREFIX)
            magic = fid.read(N)
            # If the file size is less than N, we need to make sure not
            # to seek past the beginning of the file
            fid.seek(-min(N, len(magic)), 1)  # back-up

if magic.startswith(_ZIP_PREFIX) or magic.startswith(_ZIP_SUFFIX):

                # zip-file (assume .npz)
                # Potentially transfer file ownership to NpzFile
                stack.pop_all()

ret = NpzFile(fid, own_fid=own_fid, allow_pickle=allow_pickle,

                              pickle_kwargs=pickle_kwargs,
                              max_header_size=max_header_size)
                return ret
            elif magic == format.MAGIC_PREFIX:
                # .npy file
                if mmap_mode:
                    if allow_pickle:
                        max_header_size = 2**64
                    return format.open_memmap(file, mode=mmap_mode,

max_header_size=max_header_size)
                else:

return format.read_array(fid, allow_pickle=allow_pickle,

                                             pickle_kwargs=pickle_kwargs,

max_header_size=max_header_size)
            else:
                # Try a pickle
                if not allow_pickle:

raise ValueError("Cannot load file containing pickled data "

                                     "when allow_pickle=False")
                try:
                    return pickle.load(fid, **pickle_kwargs)
                except Exception as e:

                  raise pickle.UnpicklingError(

f"Failed to interpret file {file!r} as a pickle") from e E _pickle.UnpicklingError: Failed to interpret file 'data/empty_file' as a pickle


/usr/lib/python3/dist-packages/numpy/lib/npyio.py:443: UnpicklingError

=============================== warnings summary ===============================

../../../../usr/lib/python3/dist-packages/joblib/backports.py:22

/usr/lib/python3/dist-packages/joblib/backports.py:22: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives

    import distutils  # noqa

test/test_bam.py: 12 warnings

/usr/lib/python3/dist-packages/iss/modeller.py:56: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

    read = np.fromiter((q[0] for q in quality), dtype=np.float)

test/test_bam.py::test_to_model

/usr/lib/python3/dist-packages/numpy/lib/npyio.py:716: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.

    val = np.asanyarray(val)

test/test_generator.py::test_simulate_and_save
test/test_generator.py::test_simulate_and_save_short

/usr/lib/python3/dist-packages/Bio/SeqUtils/__init__.py:144: BiopythonDeprecationWarning: GC is deprecated; please use gc_fraction instead.

    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

=========================== short test summary info ============================ FAILED test/test_error_model.py::test_bad_err_mod - _pickle.UnpicklingError: ... ============= 1 failed, 40 passed, 1 skipped, 16 warnings in 5.81s =============

autopkgtest [09:24:52]: test run-unit-test

OpenPGP_signature
Description: OpenPGP digital signature

Bug#1026344: insilicoseq: autopkgtest needs update for new version of numpy: EOFError

Reply via email to