[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-09-04 Thread Brandt Bucher


Change by Brandt Bucher :


--
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-09-03 Thread Brandt Bucher


Change by Brandt Bucher :


--
pull_requests: +26586
stage: resolved -> patch review
pull_request: https://github.com/python/cpython/pull/28147

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-09-03 Thread Brandt Bucher


Brandt Bucher  added the comment:

Found it. This particular build is configured with HAVE_ALIGNED_REQUIRED=1, 
which forces it to use fnv instead siphash24 as its string hashing algorithm.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-09-03 Thread Brandt Bucher


Brandt Bucher  added the comment:

I'm compiling Clang now to try to reproduce using a UBSan build (I'm on Ubuntu, 
though).

I'm not entirely familiar with how these sanitizer builds work... could the 
implication be that we're hitting undefined behavior at some point? Or is it 
just a red herring?

Note also that the "set([float('nan'), b'a', b'b', b'c', 'x', 'y', 'z'])" and 
"frozenset([float('nan'), b'a', b'b', b'c', 'x', 'y', 'z'])" tests seem to be 
working just fine... meaning their ordering on this buildbot is different under 
PYTHONHASHSEEDs 0 and 1 (as expected). It may still be a 
platform-or-configuration-dependent ordering, though.

Raymond: off the top of your head, are there any obvious reasons this could be 
happening?

--
assignee:  -> brandtbucher

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-09-03 Thread Brandt Bucher

Brandt Bucher  added the comment:

Thanks for finding this, Victor.

That failure is surprising to me. Is it really possible for the order of the 
elements in a set to vary based on platform or build configuration (even with a 
fixed PYTHONHASHSEED at runtime)?

Really, this looks like it’s only a bug in the test’s (read “my”) assumptions, 
not really in marshal itself. I’m glad I added this little sanity check, though.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-09-03 Thread STINNER Victor


STINNER Victor  added the comment:

The test failed at:

def test_deterministic_sets(self):
# bpo-37596: To support reproducible builds, sets and frozensets need to
# have their elements serialized in a consistent order (even when they
# have been scrambled by hash randomization):
for kind in ("set", "frozenset"):
for elements in (
"float('nan'), b'a', b'b', b'c', 'x', 'y', 'z'",
# Also test for bad interactions with backreferencing:
"('string', 1), ('string', 2), ('string', 3)",
):
s = f"{kind}([{elements}])"
with self.subTest(s):
# First, make sure that our test case still has different
# orders under hash seeds 0 and 1. If this check fails, we
# need to update this test with different elements:
args = ["-c", f"print({s})"]
_, repr_0, _ = assert_python_ok(*args, PYTHONHASHSEED="0")
_, repr_1, _ = assert_python_ok(*args, PYTHONHASHSEED="1")
self.assertNotEqual(repr_0, repr_1)  # <=== HERE
(...)

It checks that the representation of a set is different for two different 
PYTHONHASHSEED values (0 and 1). On my Fedora 34, I confirm that they are 
different:

PYTHONHASHSEED=0:

vstinner@apu$ PYTHONHASHSEED=0 ./python -c "print(set([('string', 1), 
('string', 2), ('string', 3)]))"
{('string', 1), ('string', 2), ('string', 3)}
vstinner@apu$ PYTHONHASHSEED=0 ./python -c "print(set([('string', 1), 
('string', 2), ('string', 3)]))"
{('string', 1), ('string', 2), ('string', 3)}
vstinner@apu$ PYTHONHASHSEED=0 ./python -c "print(set([('string', 1), 
('string', 2), ('string', 3)]))"
{('string', 1), ('string', 2), ('string', 3)}

versus PYTHONHASHSEED=1:

vstinner@apu$ PYTHONHASHSEED=1 ./python -c "print(set([('string', 1), 
('string', 2), ('string', 3)]))"
{('string', 3), ('string', 1), ('string', 2)}
vstinner@apu$ PYTHONHASHSEED=1 ./python -c "print(set([('string', 1), 
('string', 2), ('string', 3)]))"
{('string', 3), ('string', 1), ('string', 2)}
vstinner@apu$ PYTHONHASHSEED=1 ./python -c "print(set([('string', 1), 
('string', 2), ('string', 3)]))"

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-09-03 Thread STINNER Victor


STINNER Victor  added the comment:

I reopen the issue.

test_marshal failed on AMD64 Arch Linux Usan 3.x:
https://buildbot.python.org/all/#/builders/719/builds/108

==
FAIL: test_deterministic_sets (test.test_marshal.BugsTestCase) [set([('string', 
1), ('string', 2), ('string', 3)])]
--
Traceback (most recent call last):
  File 
"/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan/build/Lib/test/test_marshal.py",
 line 365, in test_deterministic_sets
self.assertNotEqual(repr_0, repr_1)
^^^
AssertionError: b"{('string', 1), ('string', 2), ('string', 3)}\n" == 
b"{('string', 1), ('string', 2), ('string', 3)}\n"

==
FAIL: test_deterministic_sets (test.test_marshal.BugsTestCase) 
[frozenset([('string', 1), ('string', 2), ('string', 3)])]
--
Traceback (most recent call last):
  File 
"/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan/build/Lib/test/test_marshal.py",
 line 365, in test_deterministic_sets
self.assertNotEqual(repr_0, repr_1)
^^^
AssertionError: b"frozenset({('string', 1), ('string', 2), ('string', 3)})\n" 
== b"frozenset({('string', 1), ('string', 2), ('string', 3)})\n"

--
nosy: +vstinner
resolution: fixed -> 
status: closed -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-31 Thread Brandt Bucher


Brandt Bucher  added the comment:


New changeset 51999c960e7fc45feebd629421dec6524a5fc803 by Brandt Bucher in 
branch 'main':
bpo-37596: Clean up the set/frozenset marshalling code (GH-28068)
https://github.com/python/cpython/commit/51999c960e7fc45feebd629421dec6524a5fc803


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-30 Thread Guido van Rossum


Guido van Rossum  added the comment:

Thanks! This comes right in time, because we're working on freezing many more 
modules, and modules containing frozen sets didn't have a consistent frozen 
representation. Now they do!

(See issue45019, issue45020)

--
nosy: +gvanrossum

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-30 Thread Brandt Bucher


Change by Brandt Bucher :


--
pull_requests: +26512
pull_request: https://github.com/python/cpython/pull/28068

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-25 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

Looking again, I think code is correct as-is (am not sure about the depth 
adjustment though).

Stylistically, it is different from the other blocks w_complex_object() that 
always have a "return" after setting p->error.  The new code jumps to 
"anyset_done" and then falls through the "elif" block rather than issuing a 
"return".

Since nothing else happens below the if-elif-else chain, this is without 
consequence.

--
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-25 Thread Brandt Bucher

Brandt Bucher  added the comment:

Hm, not quite sure what you mean. Are you talking about just replacing each of 
the new gotos with “Py_DECREF(pairs); return;”?

Error handling for this whole module is a bit unconventional. Some of the error 
paths in this function decrement the recursion depth counter, but I *think* 
that’s actually incorrect here… it looks like it’s our caller’s (w_object) 
responsibility to do that, error or no error.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-25 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

Should the error paths decref the key and return NULL as they do elsewhere in 
the function?

--
status: closed -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-25 Thread Łukasz Langa

Łukasz Langa  added the comment:

This is a bona fide enhancement and thus out of scope for backports. Since this 
is merged for 3.11, I'm closing the issue.

Thanks, everyone, this was some non-trivial design and implementation effort!

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-25 Thread Łukasz Langa

Łukasz Langa  added the comment:


New changeset 33d95c6facdfda3c8c0feffa7a99184e4abc2f63 by Brandt Bucher in 
branch 'main':
bpo-37596: Make `set` and `frozenset` marshalling deterministic (GH-27926)
https://github.com/python/cpython/commit/33d95c6facdfda3c8c0feffa7a99184e4abc2f63


--
nosy: +lukasz.langa

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-25 Thread Łukasz Langa

Change by Łukasz Langa :


--
versions: +Python 3.11 -Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-23 Thread Brandt Bucher


Change by Brandt Bucher :


--
pull_requests: +26377
pull_request: https://github.com/python/cpython/pull/27926

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-23 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
Removed message: https://bugs.python.org/msg400182

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-23 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

Here's pure python code for experimentation:

from marshal import dumps, loads

def marshal_set(s):
return dumps(sorted(s, key=dumps))

def unmarshal_set(m):
return frozenset(loads(m))

def test(s):
assert unmarshal_set(marshal_set(s)) == s

test({("string", 1), ("string", 2), ("string", 3)})

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-23 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

Here's pure python code for expirmentation:

from marshal import dumps, loads

def marshal_set(s):
return dumps(sorted((dumps(value), value) for value in s))

def unmarshal_set(m):
return {value for dump, value in loads(m)}

def test(s):
assert unmarshal_set(marshal_set(s)) == s

test({("string", 1), ("string", 2), ("string", 3)})

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-23 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

> I can clean it up and convert it to a PR if we decide 
> we want to go this route.

+1 This is by far the smallest intervention that has been discussed.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-23 Thread Brandt Bucher


Brandt Bucher  added the comment:

This rough proof-of-concept seems to have the desired effect:

diff --git a/Python/marshal.c b/Python/marshal.c
index 1260704c74..70f9c4b109 100644
--- a/Python/marshal.c
+++ b/Python/marshal.c
@@ -503,9 +503,23 @@ w_complex_object(PyObject *v, char flag, WFILE *p)
 W_TYPE(TYPE_SET, p);
 n = PySet_GET_SIZE(v);
 W_SIZE(n, p);
-while (_PySet_NextEntry(v, , , )) {
+PyObject *pairs = PyList_New(0);
+for (Py_ssize_t i = 0; _PySet_NextEntry(v, , , ); i++) {
+PyObject *pair = PyTuple_New(2);
+PyObject *dump = PyMarshal_WriteObjectToString(value, p->version);
+PyTuple_SET_ITEM(pair, 0, dump);
+Py_INCREF(value);
+PyTuple_SET_ITEM(pair, 1, value);
+PyList_Append(pairs, pair);
+Py_DECREF(pair);
+}
+PyList_Sort(pairs);
+for (Py_ssize_t i = 0; i < n; i++) {
+PyObject *pair = PyList_GET_ITEM(pairs, i);
+PyObject *value = PyTuple_GET_ITEM(pair, 1);
 w_object(value, p);
 }
+Py_DECREF(pairs);
 }
 else if (PyCode_Check(v)) {
 PyCodeObject *co = (PyCodeObject *)v;

I can clean it up and convert it to a PR if we decide we want to go this route.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-23 Thread Brandt Bucher


Brandt Bucher  added the comment:

Ah, yeah.

Could we add a flag to disable the reference mechanism, just for frozenset 
elements? It would make marshalled frozensets a bit bigger (unless we 
re-marshalled each one after sorting)... but I still prefer that to adding more 
logic/subclasses to frozenset.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-23 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

No, it cannot be fixed in marshal itself.

s = {("string", 1), ("string", 2), ("string", 3)}

All tuples contain references to the same string. The first serialized tuple 
will contain serialization of the string, all other will contain references to 
it. So the binary representation of the tuple depends on whether it is 
serialized first on not first.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-23 Thread Brandt Bucher


Brandt Bucher  added the comment:

Could this issue be fixed in marshal itself? Off the top of my head, one 
possible option could be to use the marshalled bytes of each elements as a sort 
key, rather than the elements themselves. So serialize, *then* sort?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-23 Thread Brandt Bucher


Change by Brandt Bucher :


--
nosy: +brandtbucher

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-14 Thread Filipe Laíns

Change by Filipe Laíns :


--
keywords: +patch
pull_requests: +26244
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/27769

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-07-30 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

The only way I can see here is to go with a similar strategy as Serhiy 
proposes, which seems that it has a non trivial complication (and a new type, 
which I am not very fond of) but is a bit cleaner than changing the semantics 
of the type, which affects much more than the storage.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-07-30 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

> I understand, the proposal would be to make frozensets keep the creation 
> order.

That would increase the memory consumption of all frozen set instances, which 
is likely not going to fly

--
nosy: +pablogsal

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-07-30 Thread Felix C. Stegerman


Change by Felix C. Stegerman :


--
nosy: +obfusk

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-05-27 Thread Filipe Laíns

Filipe Laíns  added the comment:

I understand, the proposal would be to make frozensets keep the creation order.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-05-27 Thread STINNER Victor


Change by STINNER Victor :


--
nosy:  -vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-05-26 Thread Inada Naoki


Inada Naoki  added the comment:

> If that's the case, then the argument Raymond provided against preserving 
> order does not seem that relevant, as we would only need to preserve the 
> order in the creation operation.

Note that PYC files are marshalled from code objects including frozenset 
instance, not from AST.
When marshaling, frozenset instance is created already and its creation order 
is lost .

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-05-26 Thread Filipe Laíns

Filipe Laíns  added the comment:

Ah, my bad! Though, thinking about it, it does make sense. If that's the case, 
then the argument Raymond provided against preserving order does not seem that 
relevant, as we would only need to preserve the order in the creation 
operation. What do you think? Is there anything I may be missing here? :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-05-26 Thread Inada Naoki


Inada Naoki  added the comment:

> What about normal sets?

pyc files don't contain a regular set. So it is out of scope of this issue.

--
nosy: +methane

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-05-25 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
Removed message: https://bugs.python.org/msg394414

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-05-25 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

Is it possible to defer hash randomization until after pycs are generated?  The 
underlying problem here is an intentional scrambling of data.  If determinism 
is what is desired then deferring that action addresses the action cause of 
non-determinism rather than a downstream manifestation.

Scrambling hashes provides a somewhat limited (and bypassable) security value.  
What it protects against is maliciously chosen user keys.  That can only occur 
after modules are loaded.  The risk isn't intrinsic to the module itself.

Really, I don't think we should be rewriting sets to achieve this very limited 
goal that benefits very few users. That seems like the tail wagging the dog.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-05-25 Thread Filipe Laíns

Filipe Laíns  added the comment:

What about normal sets? They also suffer from the same issue.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-05-25 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

Possible solution: add an ordered subtype of frozenset which would keep an 
array of items in the original order. The compiler only creates frozenset when 
optimizes "x in {1, 2}" or "for x in {1, 2}". It should now create an ordered 
frozenset from a list of constants (removing possible duplicates). The marshal 
module should save items in that order and restore ordered frozensets when load 
data. It should not increase memory consumption too much, because frozenset 
constants in code are rare and small.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-05-25 Thread Filipe Laíns

Filipe Laíns  added the comment:

I would not expect SOURCE_DATE_EPOCH to sacrifice performance. During 
packaging, SOURCE_DATE_EPOCH is always set, and sometimes we need to perform 
expensive operations. We only need this behavior during cache generation, 
making the solution not optimal.

Backtracking a bit to your proposal for sorting the elements. Is it possible to 
have two different types with the same name? We need a unique identifier for 
each type.
After that, we need the type to allow sorting/comparing items, which AFAIK is 
not something we can guarantee.
We could certainly do the sorting where we are able to, and bail out if 
impossible, which I feel should handle the majority of cases. This is not 
optimal, but reasonable.

Is there any way we could something like resetting the hash seed during cache 
generation?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-05-25 Thread STINNER Victor


STINNER Victor  added the comment:

> Another idea, would it be possible to add a flag to turn on reproducibility, 
> sacrificing performance?

The flag is the SOURCE_DATE_EPOCH env var, no?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-05-11 Thread Filipe Laíns

Filipe Laíns  added the comment:

Another idea, would it be possible to add a flag to turn on reproducibility, 
sacrificing performance? This flag could be set when generating bytecode, where 
the performance hit shouldn't be that relevant.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-04-15 Thread Chih-Hsuan Yen


Change by Chih-Hsuan Yen :


--
nosy:  -yan12125

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-04-15 Thread Filipe Laíns

Filipe Laíns  added the comment:

s/is can/can/

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-04-15 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

s/hundred/hundred thousand/

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-04-15 Thread Filipe Laíns

Filipe Laíns  added the comment:

> No, it would not.  We would also have to maintain order across set operations 
> such as intersection which which would become dramatically more expensive if 
> they had to maintain order.  For example intersecting a million element set 
> with a ten element set always takes ten steps regardless of the order of 
> arguments, but to maintain order of the left hand operand could take a 
> hundred times more work.

Can these operations happen during bytecode generation? I am fairly new to 
these internals so my understanding is not great. During bytecode generation is 
can code that performs such operations run?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-04-15 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

> Would it be reasonable to make it so that sets are 
> always created with the definition order?

No, it would not.  We would also have to maintain order across set operations 
such as intersection which which would become dramatically more expensive if 
they had to maintain order.  For example intersecting a million element set 
with a ten element set always takes ten steps regardless of the order of 
arguments, but to maintain order of the left hand operand could take a hundred 
times more work.

--
nosy: +rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-04-14 Thread Filipe Laíns

Filipe Laíns  added the comment:

Normal sets have the same issue, see bpo-43850.

Would it be reasonable to make it so that sets are always created with the 
definition order? Looking at the set implementation, this seems perfectly 
possible.

--
nosy: +FFY00

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2020-04-10 Thread Chih-Hsuan Yen


Chih-Hsuan Yen  added the comment:

issue34722 also talks about frozenset, nondeterministic order and sorting. 
Maybe this ticket and that one are for the same issue?

--
nosy: +yan12125

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2020-04-08 Thread Jeffery To


Change by Jeffery To :


--
nosy: +jefferyto

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2019-07-15 Thread STINNER Victor


New submission from STINNER Victor :

See bpo-29708 meta issue and https://reproducible-builds.org/ for reproducible 
builds.

pyc files are not fully reproducible yet: frozenset items are not serialized in 
a deterministic order

One solution would be to modify marshal to sort frozenset items before 
serializing them. The issue is how to handle items which cannot be compared. 
Example:

>>> l=[float("nan"), b'bytes', 'unicode']
>>> l.sort()
Traceback (most recent call last):
  File "", line 1, in 
TypeError: '<' not supported between instances of 'bytes' and 'float'

One workaround for types which cannot be compared is to use the type name in 
the key used to compare items:

>>> l.sort(key=lambda x: (type(x).__name__, x))
>>> l
[b'bytes', nan, 'unicode']

Note: comparison between bytes and str raises a BytesWarning exception when 
using python3 -bb.

Second problem: how to handle exceptions when comparison raises an error anyway?


Another solution would be to use the PYTHONHASHSEED environment variable. For 
example, if SOURCE_DATE_EPOCH is set, PYTHONHASHSEED would be set to 0. This 
option is not my favorite because it disables a security fix against denial of 
service on dict and set:
https://python-security.readthedocs.io/vuln/hash-dos.html

--

Previous discussions on reproducible frozenset:

* https://mail.python.org/pipermail/python-dev/2018-July/154604.html
* https://bugs.python.org/issue34093#msg321523

See also bpo-34093: "Reproducible pyc: FLAG_REF is not stable" and PEP 552 
"Deterministic pycs".

--
components: Interpreter Core
messages: 347969
nosy: vstinner
priority: normal
severity: normal
status: open
title: Reproducible pyc: frozenset is not serialized in a deterministic order
versions: Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com