On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado <pablog...@gmail.com> wrote:
> Although we were originally not sympathetic with it, we may need to offer > an opt-out mechanism for those users that care about the impact of the > overhead of the new data in pyc files > and in in-memory code objectsas was suggested by some folks (Thomas, Yury, > and others). For this, we could propose that the functionality will be > deactivated along with the extra > information when Python is executed in optimized mode (``python -O``) and > therefore pyo files will not have the overhead associated with the extra > required data. > Just to be clear, .pyo files have not existed for a while: https://www.python.org/dev/peps/pep-0488/. > Notice that Python > already strips docstrings in this mode so it would be "aligned" with the > current mechanism of optimized mode. > This only kicks in at the -OO level. > > Although this complicates the implementation, it certainly is still much > easier than dealing with compression (and more useful for those that don't > want the feature). Notice that we also > expect pessimistic results from compression as offsets would be quite > random (although predominantly in the range 10 - 120). > I personally prefer the idea of dropping the data with -OO since if you're stripping out docstrings you're already hurting introspection capabilities in the name of memory. Or one could go as far as to introduce -Os to do -OO plus dropping this extra data. As for .pyc file size, I personally wouldn't worry about it. If someone is that space-constrained they either aren't using .pyc files or are only shipping a single set of .pyc files under -OO and skipping source code. And .pyc files are an implementation detail of CPython so there shouldn't be too much of a concern for other interpreters. -Brett > > On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado <pablog...@gmail.com> > wrote: > >> One last note for clarity: that's the increase of size in the stdlib, the >> increase of size >> for pyc files goes from 28.471296MB to 34.750464MB, which is an increase >> of 22%. >> >> On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado <pablog...@gmail.com> >> wrote: >> >>> Some update on the numbers. We have made some draft implementation to >>> corroborate the >>> numbers with some more realistic tests and seems that our original >>> calculations were wrong. >>> The actual increase in size is quite bigger than previously advertised: >>> >>> Using bytes object to encode the final object and marshalling that to >>> disk (so using uint8_t) as the underlying >>> type: >>> >>> BEFORE: >>> >>> ❯ ./python -m compileall -r 1000 Lib > /dev/null >>> ❯ du -h Lib -c --max-depth=0 >>> 70M Lib >>> 70M total >>> >>> AFTER: >>> ❯ ./python -m compileall -r 1000 Lib > /dev/null >>> ❯ du -h Lib -c --max-depth=0 >>> 76M Lib >>> 76M total >>> >>> So that's an increase of 8.56 % over the original value. This is storing >>> the start offset and end offset with no compression >>> whatsoever. >>> >>> On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado <pablog...@gmail.com> >>> wrote: >>> >>>> Hi there, >>>> >>>> We are preparing a PEP and we would like to start some early discussion >>>> about one of the main aspects of the PEP. >>>> >>>> The work we are preparing is to allow the interpreter to produce more >>>> fine-grained error messages, pointing to >>>> the source associated to the instructions that are failing. For example: >>>> >>>> Traceback (most recent call last): >>>> >>>> File "test.py", line 14, in <module> >>>> >>>> lel3(x) >>>> >>>> ^^^^^^^ >>>> >>>> File "test.py", line 12, in lel3 >>>> >>>> return lel2(x) / 23 >>>> >>>> ^^^^^^^ >>>> >>>> File "test.py", line 9, in lel2 >>>> >>>> return 25 + lel(x) + lel(x) >>>> >>>> ^^^^^^ >>>> >>>> File "test.py", line 6, in lel >>>> >>>> return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e) >>>> >>>> ^^^^^^^^^^^^^^^^^^^^^ >>>> >>>> TypeError: 'NoneType' object is not subscriptable >>>> >>>> The cost of this is having the start column number and end >>>> column number information for every bytecode instruction >>>> and this is what we want to discuss (there is also some stack cost to >>>> re-raise exceptions but that's not a big problem in >>>> any case). Given that column numbers are not very big compared with >>>> line numbers, we plan to store these as unsigned chars >>>> or unsigned shorts. We ran some experiments over the standard library >>>> and we found that the overhead of all pyc files is: >>>> >>>> * If we use shorts, the total overhead is ~3% (total size 28MB and the >>>> extra size is 0.88 MB). >>>> * If we use chars. the total overhead is ~1.5% (total size 28 MB and >>>> the extra size is 0.44MB). >>>> >>>> One of the disadvantages of using chars is that we can only report >>>> columns from 1 to 255 so if an error happens in a column >>>> bigger than that then we would have to exclude it (and not show the >>>> highlighting) for that frame. Unsigned short will allow >>>> the values to go from 0 to 65535. >>>> >>>> Unfortunately these numbers are not easily compressible, as every >>>> instruction would have very different offsets. >>>> >>>> There is also the possibility of not doing this based on some build >>>> flag on when using -O to allow users to opt out, but given the fact >>>> that these numbers can be quite useful to other tools like coverage >>>> measuring tools, tracers, profilers and the such adding conditional >>>> logic to many places would complicate the implementation considerably >>>> and will potentially reduce the usability of those tools so we prefer >>>> not to have the conditional logic. We believe this is extra cost is >>>> very much worth the better error reporting but we understand and respect >>>> other points of view. >>>> >>>> Does anyone see a better way to encode this information **without >>>> complicating a lot the implementation**? What are people thoughts on the >>>> feature? >>>> >>>> Thanks in advance, >>>> >>>> Regards from cloudy London, >>>> Pablo Galindo Salgado >>>> >>>> _______________________________________________ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/JUXUC7TYPAMB4EKW6HJL77ORDYQRJEFG/ > Code of Conduct: http://python.org/psf/codeofconduct/ >
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/2NKLQFEVHQ53QSTV4ZKQ3EYPCLTZXMFF/ Code of Conduct: http://python.org/psf/codeofconduct/