Your message dated Sat, 19 Dec 2020 01:20:17 +0000
with message-id <[email protected]>
and subject line Bug#975915: fixed in python-debian 0.1.39
has caused the Debian Bug report #975915,
regarding python3-debian: Deb822 objects cannot be serialized with pickle module
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)


-- 
975915: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=975915
Debian Bug Tracking System
Contact [email protected] with problems
--- Begin Message ---
Package: python3-debian
Version: 0.1.38
Severity: normal
Tags: patch

Deb822 modules cannot properly be serialized by the pickle module.

It turns out the dict uses _CaseInsensitiveString which stores a
pre-computed hash value of a `str`. However Python randomizes hashes,
i.e., one will get a different hash for a _CaseInsensitiveString
restored by the pickle module and a newly created
_CaseInsensitiveString. As a workaround one can set the environment
variable `PYTHONHASHSEED=0`.

A short demo program is attached: call deb822pickler.py first, then
deb822unpickler.py.

I attached patches to just drop the precomputed hash. This makes
repeatedly calling `hash()` on the same string a bit slower, but on
the other hand calling `hash()` only once is a bit faster. I don't
think one usually needs to call `hash()` on the same string
repeatedly.

Also attached is a patch using `__slots__` to declare the member
variable. This makes creating a _CaseInsensitveString and calling hash
a bit faster.

I've attached a short benchmark script to compare the
implementations. On my system I get:

Using __slots__:

  Calling hash() repeatedly:
  Old implementation:  0.14679651000187732
  New implementation:  0.16284992700093426
  Creating string and calling hash once:
  Old implementation:  0.6289306499966187
  New implementation:  0.5861582299985457

Not using __slots__:

  Calling hash() repeatedly:
  Old implementation:  0.15407300100196153
  New implementation:  0.1629600290034432
  Creating string and calling hash once:
  Old implementation:  0.6767919499980053
  New implementation:  0.614397427008953

Ansgar
#! /usr/bin/python3

import pickle
from debian.deb822 import Deb822

d = Deb822("Field: value")

with open("db.pickle", "wb") as fh:
    pickle.dump(d, fh)
#! /usr/bin/python3

import pickle
from debian.deb822 import Deb822

with open("db.pickle", "rb") as fh:
    d = pickle.load(fh)

print(d)
#! /usr/bin/python3

import timeit

class _OldCaseInsensitiveString(str):
    #str_lower = ''
    #str_lower_hash = 0
    __slots__ = 'str_lower', 'str_lower_hash'

    def __new__(cls, str_): # type: ignore
        s = str.__new__(cls, str_)    # type: ignore
        s.str_lower = str_.lower()
        s.str_lower_hash = hash(s.str_lower)
        return s

    def __hash__(self):
        # type: () -> int
        return self.str_lower_hash


class _CaseInsensitiveString(str):
    #str_lower = ''
    __slots__ = 'str_lower'

    def __new__(cls, str_): # type: ignore
        s = str.__new__(cls, str_)    # type: ignore
        s.str_lower = str_.lower()
        return s

    def __hash__(self):
        # type: () -> int
        return hash(self.str_lower)

s_old = _OldCaseInsensitiveString("Field")
t_old = timeit.timeit('hash(s_old)', globals=globals())

s_new = _CaseInsensitiveString("Field")
t_new = timeit.timeit('hash(s_new)', globals=globals())

print("Calling hash() repeatedly:")
print("Old implementation: ", t_old)
print("New implementation: ", t_new)

t_old = timeit.timeit('hash(_OldCaseInsensitiveString("Field"))', 
globals=globals())
t_new = timeit.timeit('hash(_CaseInsensitiveString("Field"))', 
globals=globals())

print("Creating string and calling hash once:")
print("Old implementation: ", t_old)
print("New implementation: ", t_new)
>From 54165117266fcf419bc572e30bf66b4249074328 Mon Sep 17 00:00:00 2001
From: Ansgar <[email protected]>
Date: Thu, 26 Nov 2020 16:42:16 +0100
Subject: [PATCH 1/2] _CaseInsensitiveString: do not precompute hash

This allows the "pickle" module to work correctly: hashing a str will
give different results in different invocations of the Python
interpreter, so the old hash value should not be restored. Not storing
the hash value at all works for this.
---
 lib/debian/deb822.py | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/lib/debian/deb822.py b/lib/debian/deb822.py
index 9481a9c..d07abbb 100644
--- a/lib/debian/deb822.py
+++ b/lib/debian/deb822.py
@@ -2513,17 +2513,15 @@ class _CaseInsensitiveString(str):
     # Fake definitions because mypy doesn't find them in __new__ ## CRUFT
     # https://github.com/python/mypy/issues/1021
     str_lower = ''
-    str_lower_hash = 0
 
     def __new__(cls, str_): # type: ignore
         s = str.__new__(cls, str_)    # type: ignore
         s.str_lower = str_.lower()
-        s.str_lower_hash = hash(s.str_lower)
         return s
 
     def __hash__(self):
         # type: () -> int
-        return self.str_lower_hash
+        return hash(self.str_lower)
 
     def __eq__(self, other):
         # type: (Any) -> Any
-- 
2.29.2

>From a547de6d8acee22eeb6f73ab570de76e80f21e6c Mon Sep 17 00:00:00 2001
From: Ansgar <[email protected]>
Date: Thu, 26 Nov 2020 16:46:49 +0100
Subject: [PATCH 2/2] _CaseInsensitiveString: declare data member using
 `__slots__`

---
 lib/debian/deb822.py | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/lib/debian/deb822.py b/lib/debian/deb822.py
index d07abbb..748383e 100644
--- a/lib/debian/deb822.py
+++ b/lib/debian/deb822.py
@@ -2510,9 +2510,7 @@ class Removals(Deb822):
 class _CaseInsensitiveString(str):
     """Case insensitive string.
     """
-    # Fake definitions because mypy doesn't find them in __new__ ## CRUFT
-    # https://github.com/python/mypy/issues/1021
-    str_lower = ''
+    __slots__ = 'str_lower'
 
     def __new__(cls, str_): # type: ignore
         s = str.__new__(cls, str_)    # type: ignore
-- 
2.29.2


--- End Message ---
--- Begin Message ---
Source: python-debian
Source-Version: 0.1.39
Done: Stuart Prescott <[email protected]>

We believe that the bug you reported is fixed in the latest version of
python-debian, which is due to be installed in the Debian FTP archive.

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to [email protected],
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Stuart Prescott <[email protected]> (supplier of updated python-debian package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing [email protected])


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Format: 1.8
Date: Sat, 19 Dec 2020 11:59:40 +1100
Source: python-debian
Architecture: source
Version: 0.1.39
Distribution: unstable
Urgency: medium
Maintainer: Debian python-debian Maintainers 
<[email protected]>
Changed-By: Stuart Prescott <[email protected]>
Closes: 875306 970824 971960 975910 975915 977355
Changes:
 python-debian (0.1.39) unstable; urgency=medium
 .
   [ Stuart Prescott ]
   * Move re.compile calls out of functions (Closes: #971960).
   * Revert unintended renaming of Changelog.get_version/set_version
     (Closes: #975910).
   * Add a type for .buildinfo files (deb822.BuildInfo) (Closes: #875306).
   * Add support for SHA1-Download and SHA256-* variants in PdiffIndex class
     for .diff/Index files (Closes: #970824).
   * Permit single-character package names in dependency relationship
     specifications (Closes: #977355).
   * Silence deprecation warnings in the test suite.
   * Test that UserWarning is emitted in tests where it should be.
   * Update Standards-Version to 4.5.1 (no changes required).
   * Update to debhelper-compat (= 13).
   * Update examples to use #!/usr/bin/python3.
   * Fix tabs vs spaces in examples.
 .
   [ Jose Luis Rivero ]
   * Allow debian_support.PackageFile to accept StringIO as well as BytesIO.
 .
   [ Ansgar ]
   * Change handling of case-insensitive field names to allow Deb822 objects
     to be serialised  (Closes: #975915).
 .
   [ Jelmer Vernooij ]
   * Add myself to uploaders.
 .
   [ Johannes 'josch' Schauer ]
   * Add SHA265 support to handling of pdiffs.
   * Add support for additional headers for merged pdiffs to PDiffIndex.
   * Allow debian_support.patches_from_ed_script to consume both bytes and str.
Checksums-Sha1:
 d227fbcc7e8221a4922f52f4e90c11ec5f4bbcd5 2226 python-debian_0.1.39.dsc
 864d76496d8f42278f441b013e6f0523af171ec6 319020 python-debian_0.1.39.tar.xz
 a0f3c2cee0f8bd25734c3a83d3e963875de3aa93 6705 
python-debian_0.1.39_amd64.buildinfo
Checksums-Sha256:
 ae048bc8b3b480cabd7e2619417e7743322c1bd8db9958552763c757ae0a05d3 2226 
python-debian_0.1.39.dsc
 dcb119620e01ae9c913b315b03cdd5a55f00f1d0890f5bb5b8aaafd7fb0a322e 319020 
python-debian_0.1.39.tar.xz
 4aed74902d3dbfe1f113802b29e23be30b4377fc8098a26b4958521952c0766e 6705 
python-debian_0.1.39_amd64.buildinfo
Files:
 9719ac6f2a94f263a7a91df55e3343c5 2226 python optional python-debian_0.1.39.dsc
 8d4472af3e5c9a9baa08a20d0be82c6a 319020 python optional 
python-debian_0.1.39.tar.xz
 a7a7adf48d2ce2c3bbcdd92e683c82a3 6705 python optional 
python-debian_0.1.39_amd64.buildinfo

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEkOLSwa0Uaht+u4kdu8F+uxOW8vcFAl/dUjsACgkQu8F+uxOW
8vfgzg/+LmAu99a3Q8BIXAinoffrtQUYGTwFrqokNdd/Hs9w47sViTHvueJP+jmF
bM1sw5OHSFfJohGOy2Nz3U1c3I1xeaxTivSZP75FkdF5gLbQyGTZNyPii9xoDFxv
V0bMy7cT/Mz/phZEDHnkCR5CZi3tJQz+zCtiW2nn75Q0gXOmkI8ZSa7BzxSn0cUt
7VldmtNsvwsxxPjpT9/zu5LFL5NlvcWvL4hnb2UyKiwxINGAKQyvHZRkWHXEPgxE
4/iI+E1NRl4N9FsdlmDpibb7vV3JGhA/N52kN5uzJsJ9l9ZPijl3KCawKDFLQ6pD
muk0y7vQcFHtuHtg67Hip2BEfUvlAdCr0bdhF5C9JR424bHcKYbq5xMM2YH0apm5
pB/GiJlnznDg8gwW+zYIE9PcNgBRIufIfZz8LG7ZEgIk3oqHSvvPH7RlSUovCuP+
z7enn3SJ5vh8ctV2sYJTrT2iTC58qZQHK3ECK9UffLo8loAr4otiXcBV3TqN3GAa
i41NDtWnRnWsKjbHceypWuImCYUmzCgJjNbElHz6cTsVkVO6gxCiRYX9e223bQ1c
bPLuv51TbtBNaZQ69Ew/nGhkeB1i38Kp7lEGAOv1G035ZvCAs+ykArtbSVo7Tw5w
fh9QqDhRKNIeYIrAzLnXoMIR6KEAFww4uXNftqHpaglGKzgL5rc=
=vhbf
-----END PGP SIGNATURE-----

--- End Message ---
-- 
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/pkg-python-debian-maint

Reply via email to