[issue34093] Reproducible pyc: FLAG_REF is not stable.

2021-09-20 Thread Eric Snow


Eric Snow  added the comment:

FWIW, I found a faster solution than calling `w_object()` twice.

Currently the logic for w_ref() (used for each "complex" object) looks like 
this:

* if ob_ref == 1
   * do not apply FLAG_REF
   * marshal normally
* else if seen for the first time
   * apply FLAG_REF
   * marshal normally
* otherwise
   * emit TYPE_REF
   * emit the ref index of the first instance

The faster solution looks like this:

* if seen for the first time
   * do not apply FLAG_REF
   * marshal normally
   * record the index of the type byte in the output stream
* else if seen for a second time
   * apply FLAG_REF to the byte at the earlier-recorded position
   * emit TYPE_REF
   * emit the ref index of the first instance
* otherwise
   * emit TYPE_REF
   * emit the ref index of the first instance

While this is faster, there are two downsides: extra memory usage and it isn't 
practical when writing to a file.  However, I don't think either is a 
significant problem.  For the former, it can be mostly mitigated by using the 
negative values in WFILE.hashtable to store the type byte position.  For the 
latter, "marshal.dump()" is already a light wrapper around "marshal.dump()" and 
for PyMarshal_WriteObjectToFile() we simply stick with the current unstable 
approach (or change it to do what "marshal.dump()" does).

FYI, I mostly have that implemented in a branch, but am not sure when I'll get 
back to it.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34093] Reproducible pyc: FLAG_REF is not stable.

2021-09-20 Thread Eric Snow


Eric Snow  added the comment:

It turns out that I don't need this after all (once I merged gh-28392 and 
bpo-45188 was resolved).  That impacts how much time I have to spend on this, 
so I might not be able to pursue this further.  That said, I think it is worth 
doing and the PR I have up mostly does everything we need here.  So I'll see if 
I can follow this through. :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34093] Reproducible pyc: FLAG_REF is not stable.

2021-09-16 Thread Eric Snow


Change by Eric Snow :


--
pull_requests: +26811
pull_request: https://github.com/python/cpython/pull/28379

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34093] Reproducible pyc: FLAG_REF is not stable.

2021-09-16 Thread Eric Snow


Eric Snow  added the comment:

FYI, I unknowingly created a duplicate of this issue a few days ago, bpo-45186, 
and created a PR for it: https://github.com/python/cpython/pull/28379.  
Interestingly, while I did that PR independently, it has a lot in common with 
Inada-san's second PR.

My interest here is in how frozen modules can be affected by this problem, 
particularly between debug and non-debug builds.  See bpo-45020, where I'm 
working on freezing all the stdlib modules imported during startup.

--
components: +Interpreter Core -Extension Modules
nosy: +eric.snow
type:  -> behavior
versions: +Python 3.11 -Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34093] Reproducible pyc: FLAG_REF is not stable.

2021-09-04 Thread Chih-Hsuan Yen


Change by Chih-Hsuan Yen :


--
nosy: +yan12125

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34093] Reproducible pyc: FLAG_REF is not stable.

2019-07-15 Thread STINNER Victor


STINNER Victor  added the comment:

> According to Serhiy Storchaka, currently marshal.dumps() writes frozenset in 
> arbitrary order, and so frozenset serialization is not reproducible: 
> https://mail.python.org/pipermail/python-dev/2018-July/154604.html

I created bpo-37596 "Reproducible pyc: frozenset is not serialized in a 
deterministic order" to track this issue.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34093] Reproducible pyc: FLAG_REF is not stable.

2018-07-16 Thread INADA Naoki


Change by INADA Naoki :


--
pull_requests: +7827
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34093] Reproducible pyc: FLAG_REF is not stable.

2018-07-13 Thread INADA Naoki


INADA Naoki  added the comment:

> Would it not be easy to add a named optional keyword
> argument, like "stable=True"?

My pull request did it.

But for now, I get hint on ML and overwrote my PR with another way: Use 
FLAG_REF for all interned strings.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34093] Reproducible pyc: FLAG_REF is not stable.

2018-07-13 Thread Christian Tismer


Christian Tismer  added the comment:

Why must this become slower?

To my knowledge, many projects prefer marshal over pickle
for suitable simple objects because it is
so very fast. I would not throw that away:

Would it not be easy to add a named optional keyword
argument, like "stable=True"?

--
nosy: +Christian.Tismer

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34093] Reproducible pyc: FLAG_REF is not stable.

2018-07-12 Thread STINNER Victor


STINNER Victor  added the comment:

> So startup must not be slower than 4%.

I know. But Python does more than compile()+dumps() at the first run. I'm 
curious if it is feasible to measure this cost. But it may be hard to get 
reliable benchmarks, since I expect that the difference will be very small, and 
I know very well that measuring Python startup is hard since it depends a lot 
of on the filesystem which is hard to measure.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34093] Reproducible pyc: FLAG_REF is not stable.

2018-07-12 Thread INADA Naoki


INADA Naoki  added the comment:

> STINNER Victor  added the comment:
>
> What is the time spent in marshal.dumps() at Python startup when Python has 
> to create all .pyc files? For example "./python -c pass" in the master branch 
> with no external dependency? My question is if the PR makes Python startup 5% 
> slower or less than 1% slower.

When startup, Python does more than compile()+marshal.dumps().
And as I wrote above, it makes compile()+marshal.dumps() only 4% slower.
So startup must not be slower than 4%.

Additionally, it happens only once if pyc can be writable.
(I don't know if marshal.dumps() is called when open(cache_path, 'wb') failed)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34093] Reproducible pyc: FLAG_REF is not stable.

2018-07-12 Thread INADA Naoki


INADA Naoki  added the comment:

> STINNER Victor  added the comment:
>
> According to Serhiy Storchaka, currently marshal.dumps() writes frozenset in 
> arbitrary order, and so frozenset serialization is not reproducible:
> https://mail.python.org/pipermail/python-dev/2018-July/154604.html

PYTHONHASHSEED can be used to stable frozenset order.

On the other hand, refcnt based approach is more unstable.
Even when x is y, dumps(x) == dumps(y) is not guaranteed.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34093] Reproducible pyc: FLAG_REF is not stable.

2018-07-12 Thread STINNER Victor


STINNER Victor  added the comment:

What is the time spent in marshal.dumps() at Python startup when Python has to 
create all .pyc files? For example "./python -c pass" in the master branch with 
no external dependency? My question is if the PR makes Python startup 5% slower 
or less than 1% slower.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34093] Reproducible pyc: FLAG_REF is not stable.

2018-07-12 Thread STINNER Victor


STINNER Victor  added the comment:

According to Serhiy Storchaka, currently marshal.dumps() writes frozenset in 
arbitrary order, and so frozenset serialization is not reproducible:
https://mail.python.org/pipermail/python-dev/2018-July/154604.html

--
nosy: +vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34093] Reproducible pyc: FLAG_REF is not stable.

2018-07-11 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

Look also at alternate patches for issue20416. Some of them can solve this 
problem for simple types. If they have better performance, using them for 
simple types could save a time. But this will complicate a code, and I'm not 
sure it is worth.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34093] Reproducible pyc: FLAG_REF is not stable.

2018-07-11 Thread INADA Naoki


Change by INADA Naoki :


--
nosy: +benjamin.peterson, serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34093] Reproducible pyc: FLAG_REF is not stable.

2018-07-11 Thread INADA Naoki


INADA Naoki  added the comment:

marshal: Mean +- std dev: [master] 123 us +- 7 us -> [patched] 173 us +- 2 us: 
1.41x slower (+41%)
compile+marshal: Mean +- std dev: [master] 5.28 ms +- 0.02 ms -> [patched] 5.47 
ms +- 0.34 ms: 1.04x slower (+4%)

--
Added file: https://bugs.python.org/file47683/bm_marshal.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34093] Reproducible pyc: FLAG_REF is not stable.

2018-07-11 Thread INADA Naoki


New submission from INADA Naoki :

PR-8226 makes marshal two-pass.  It may have small overhead.

In case of compiling module, marshal performance is negligible.
But how in other cases?  Should this change optional?

And should we backport this to Python 3.7?
Or should distributors cherrypick this?

--
stage: patch review -> 

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34093] Reproducible pyc: FLAG_REF is not stable.

2018-07-11 Thread INADA Naoki


Change by INADA Naoki :


--
keywords: +patch
pull_requests: +7779
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34093] Reproducible pyc: FLAG_REF is not stable.

2018-07-11 Thread INADA Naoki


Change by INADA Naoki :


--
components: Extension Modules
nosy: inada.naoki
priority: normal
severity: normal
status: open
title: Reproducible pyc: FLAG_REF is not stable.
versions: Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com