Control: forwarded -1 https://github.com/python/cpython/issues/92132
Hi,
On Fri, 29 Apr 2022 21:00:52 -0700 Keith Amling wrote:
> From skimming some of cpython's "marshal" code [1] my best guess is that
> first difference is between it thinking the `_m` string might have
> another reference to it (and thus adding 0x80, or FLAG_REF to it) or
> not. This seems driven by whether or not python's object for the string
> has other references (it calls Py_REFCNT(v) to decide, see line 302).
>
> I assume the difference is whether or not python has bothered to collect
> some other reference to the string or not. Type "Z" is an interned
> string type, TYPE_SHORT_ASCII_INTERNED, which therefore makes sense that
> it might be shared with who knows what else. I'm assuming this stops
> reproducing when you change it to a unique name since no one else will share
> the reference and you'll just deterministically get no FLAG_REF.
thank you! It was indeed about that line and there exists a pull request
upstream that fixes this issue:
https://github.com/python/cpython/pull/8226
Specifically, the following patch to python3.10 in Debian seems to solve this.
I also attached a full debdiff for your convenience. Thanks!
cheers, josch
>From 6c8ea7c1dacd42f3ba00440231ec0e6b1a38300d Mon Sep 17 00:00:00 2001
From: Inada Naoki
Date: Sat, 14 Jul 2018 00:46:11 +0900
Subject: [PATCH] Use FLAG_REF always for interned strings
---
Python/marshal.c | 9 +++--
1 file changed, 7 insertions(+), 2 deletions(-)
--- a/Python/marshal.c
+++ b/Python/marshal.c
@@ -298,9 +298,14 @@ w_ref(PyObject *v, char *flag, WFILE *p)
if (p->version < 3 || p->hashtable == NULL)
return 0; /* not writing object references */
-/* if it has only one reference, it definitely isn't shared */
-if (Py_REFCNT(v) == 1)
+/* If it has only one reference, it definitely isn't shared.
+ * But we use TYPE_REF always for interned string, to PYC file stable
+ * as possible.
+ */
+if (Py_REFCNT(v) == 1 &&
+!(PyUnicode_CheckExact(v) && PyUnicode_CHECK_INTERNED(v))) {
return 0;
+}
entry = _Py_hashtable_get_entry(p->hashtable, v);
if (entry != NULL) {diff -Nru python3.10-3.10.4/debian/changelog python3.10-3.10.4/debian/changelog
--- python3.10-3.10.4/debian/changelog 2022-04-02 11:04:19.0 +0200
+++ python3.10-3.10.4/debian/changelog 2022-05-02 13:25:08.0 +0200
@@ -1,3 +1,10 @@
+python3.10 (3.10.4-3.1) UNRELEASED; urgency=medium
+
+ * Non-maintainer upload.
+ * Add patch to set FLAG_REF always for interned strings. Closes: #1010368
+
+ -- Johannes Schauer Marin Rodrigues Mon, 02 May 2022 13:25:08 +0200
+
python3.10 (3.10.4-3) unstable; urgency=medium
* Build a python3.10-nopie package, diverting the python3.10
diff -Nru python3.10-3.10.4/debian/patches/series python3.10-3.10.4/debian/patches/series
--- python3.10-3.10.4/debian/patches/series 2022-04-02 11:04:19.0 +0200
+++ python3.10-3.10.4/debian/patches/series 2022-05-02 13:24:24.0 +0200
@@ -39,3 +39,4 @@
fix-py_compile.diff
reproducible-pyc.diff
fix-ia64.diff
+use-FLAG_REF-always-for-interned-strings.diff
diff -Nru python3.10-3.10.4/debian/patches/use-FLAG_REF-always-for-interned-strings.diff python3.10-3.10.4/debian/patches/use-FLAG_REF-always-for-interned-strings.diff
--- python3.10-3.10.4/debian/patches/use-FLAG_REF-always-for-interned-strings.diff 1970-01-01 01:00:00.0 +0100
+++ python3.10-3.10.4/debian/patches/use-FLAG_REF-always-for-interned-strings.diff 2022-05-02 13:25:07.0 +0200
@@ -0,0 +1,37 @@
+From 5fe4c1442ded6bdc4c724935d118e996fa022eac Mon Sep 17 00:00:00 2001
+From: Inada Naoki
+Date: Sat, 14 Jul 2018 00:46:11 +0900
+Subject: [PATCH] Use FLAG_REF always for interned strings
+
+https://bugs.debian.org/1010368
+https://github.com/python/cpython/pull/8226
+https://github.com/python/cpython/issues/92132
+
+---
+ Python/marshal.c | 9 +++--
+ 1 file changed, 7 insertions(+), 2 deletions(-)
+
+diff --git a/Python/marshal.c b/Python/marshal.c
+index 4125240..341c9aa 100644
+--- a/Python/marshal.c
b/Python/marshal.c
+@@ -298,9 +298,14 @@ w_ref(PyObject *v, char *flag, WFILE *p)
+ if (p->version < 3 || p->hashtable == NULL)
+ return 0; /* not writing object references */
+
+-/* if it has only one reference, it definitely isn't shared */
+-if (Py_REFCNT(v) == 1)
++/* If it has only one reference, it definitely isn't shared.
++ * But we use TYPE_REF always for interned string, to PYC file stable
++ * as possible.
++ */
++if (Py_REFCNT(v) == 1 &&
++!(PyUnicode_CheckExact(v) && PyUnicode_CHECK_INTERNED(v))) {
+ return 0;
++}
+
+ entry = _Py_hashtable_get_entry(p->hashtable, v);
+ if (entry != NULL) {
+--
+2.35.1
+
signature.asc
Description: signature