Bug#1010368: Maybe marshal nondeterminism?

2022-05-04 Thread Johannes Schauer Marin Rodrigues
Hi,

Quoting Johannes Schauer Marin Rodrigues (2022-05-02 13:31:15)
> thank you! It was indeed about that line and there exists a pull request
> upstream that fixes this issue:
> 
> https://github.com/python/cpython/pull/8226
> 
> Specifically, the following patch to python3.10 in Debian seems to solve this.
> I also attached a full debdiff for your convenience. Thanks!

that the pull request has now been merged into main and I guess it's thus safe
to backport it to 3.10:

https://github.com/python/cpython/commit/6dcfd6c5e3cb46543e82dc3f7234546adf4bb04a

Thanks!

cheers, josch

signature.asc
Description: signature


Bug#1010368: Maybe marshal nondeterminism?

2022-05-02 Thread Johannes Schauer Marin Rodrigues
Control: forwarded -1 https://github.com/python/cpython/issues/92132

Hi,

On Fri, 29 Apr 2022 21:00:52 -0700 Keith Amling  wrote:
> From skimming some of cpython's "marshal" code [1] my best guess is that
> first difference is between it thinking the `_m` string might have
> another reference to it (and thus adding 0x80, or FLAG_REF to it) or
> not.  This seems driven by whether or not python's object for the string
> has other references (it calls Py_REFCNT(v) to decide, see line 302).
> 
> I assume the difference is whether or not python has bothered to collect
> some other reference to the string or not.  Type "Z" is an interned
> string type, TYPE_SHORT_ASCII_INTERNED, which therefore makes sense that
> it might be shared with who knows what else.  I'm assuming this stops
> reproducing when you change it to a unique name since no one else will share
> the reference and you'll just deterministically get no FLAG_REF.

thank you! It was indeed about that line and there exists a pull request
upstream that fixes this issue:

https://github.com/python/cpython/pull/8226

Specifically, the following patch to python3.10 in Debian seems to solve this.
I also attached a full debdiff for your convenience. Thanks!

cheers, josch

>From 6c8ea7c1dacd42f3ba00440231ec0e6b1a38300d Mon Sep 17 00:00:00 2001
From: Inada Naoki 
Date: Sat, 14 Jul 2018 00:46:11 +0900
Subject: [PATCH] Use FLAG_REF always for interned strings

---
 Python/marshal.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

--- a/Python/marshal.c
+++ b/Python/marshal.c
@@ -298,9 +298,14 @@ w_ref(PyObject *v, char *flag, WFILE *p)
 if (p->version < 3 || p->hashtable == NULL)
 return 0; /* not writing object references */

-/* if it has only one reference, it definitely isn't shared */
-if (Py_REFCNT(v) == 1)
+/* If it has only one reference, it definitely isn't shared.
+ * But we use TYPE_REF always for interned string, to PYC file stable
+ * as possible.
+ */
+if (Py_REFCNT(v) == 1 &&
+!(PyUnicode_CheckExact(v) && PyUnicode_CHECK_INTERNED(v))) {
 return 0;
+}

 entry = _Py_hashtable_get_entry(p->hashtable, v);
 if (entry != NULL) {diff -Nru python3.10-3.10.4/debian/changelog python3.10-3.10.4/debian/changelog
--- python3.10-3.10.4/debian/changelog	2022-04-02 11:04:19.0 +0200
+++ python3.10-3.10.4/debian/changelog	2022-05-02 13:25:08.0 +0200
@@ -1,3 +1,10 @@
+python3.10 (3.10.4-3.1) UNRELEASED; urgency=medium
+
+  * Non-maintainer upload.
+  * Add patch to set FLAG_REF always for interned strings. Closes: #1010368
+
+ -- Johannes Schauer Marin Rodrigues   Mon, 02 May 2022 13:25:08 +0200
+
 python3.10 (3.10.4-3) unstable; urgency=medium
 
   * Build a python3.10-nopie package, diverting the python3.10
diff -Nru python3.10-3.10.4/debian/patches/series python3.10-3.10.4/debian/patches/series
--- python3.10-3.10.4/debian/patches/series	2022-04-02 11:04:19.0 +0200
+++ python3.10-3.10.4/debian/patches/series	2022-05-02 13:24:24.0 +0200
@@ -39,3 +39,4 @@
 fix-py_compile.diff
 reproducible-pyc.diff
 fix-ia64.diff
+use-FLAG_REF-always-for-interned-strings.diff
diff -Nru python3.10-3.10.4/debian/patches/use-FLAG_REF-always-for-interned-strings.diff python3.10-3.10.4/debian/patches/use-FLAG_REF-always-for-interned-strings.diff
--- python3.10-3.10.4/debian/patches/use-FLAG_REF-always-for-interned-strings.diff	1970-01-01 01:00:00.0 +0100
+++ python3.10-3.10.4/debian/patches/use-FLAG_REF-always-for-interned-strings.diff	2022-05-02 13:25:07.0 +0200
@@ -0,0 +1,37 @@
+From 5fe4c1442ded6bdc4c724935d118e996fa022eac Mon Sep 17 00:00:00 2001
+From: Inada Naoki 
+Date: Sat, 14 Jul 2018 00:46:11 +0900
+Subject: [PATCH] Use FLAG_REF always for interned strings
+
+https://bugs.debian.org/1010368
+https://github.com/python/cpython/pull/8226
+https://github.com/python/cpython/issues/92132
+
+---
+ Python/marshal.c | 9 +++--
+ 1 file changed, 7 insertions(+), 2 deletions(-)
+
+diff --git a/Python/marshal.c b/Python/marshal.c
+index 4125240..341c9aa 100644
+--- a/Python/marshal.c
 b/Python/marshal.c
+@@ -298,9 +298,14 @@ w_ref(PyObject *v, char *flag, WFILE *p)
+ if (p->version < 3 || p->hashtable == NULL)
+ return 0; /* not writing object references */
+ 
+-/* if it has only one reference, it definitely isn't shared */
+-if (Py_REFCNT(v) == 1)
++/* If it has only one reference, it definitely isn't shared.
++ * But we use TYPE_REF always for interned string, to PYC file stable
++ * as possible.
++ */
++if (Py_REFCNT(v) == 1 &&
++!(PyUnicode_CheckExact(v) && PyUnicode_CHECK_INTERNED(v))) {
+ return 0;
++}
+ 
+ entry = _Py_hashtable_get_entry(p->hashtable, v);
+ if (entry != NULL) {
+-- 
+2.35.1
+


signature.asc
Description: signature


Bug#1010368: Maybe marshal nondeterminism?

2022-04-29 Thread Keith Amling
>From skimming some of cpython's "marshal" code [1] my best guess is that
first difference is between it thinking the `_m` string might have
another reference to it (and thus adding 0x80, or FLAG_REF to it) or
not.  This seems driven by whether or not python's object for the string
has other references (it calls Py_REFCNT(v) to decide, see line 302).

I assume the difference is whether or not python has bothered to collect
some other reference to the string or not.  Type "Z" is an interned
string type, TYPE_SHORT_ASCII_INTERNED, which therefore makes sense that
it might be shared with who knows what else.  I'm assuming this stops
reproducing when you change it to a unique name since no one else will
share the reference and you'll just deterministically get no FLAG_REF.

Just my best guesses.

Keith

[1] https://github.com/python/cpython/blob/main/Python/marshal.c