One last note for clarity: that's the increase of size in the stdlib, the increase of size for pyc files goes from 28.471296MB to 34.750464MB, which is an increase of 22%.
On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado <pablog...@gmail.com> wrote: > Some update on the numbers. We have made some draft implementation to > corroborate the > numbers with some more realistic tests and seems that our original > calculations were wrong. > The actual increase in size is quite bigger than previously advertised: > > Using bytes object to encode the final object and marshalling that to disk > (so using uint8_t) as the underlying > type: > > BEFORE: > > ❯ ./python -m compileall -r 1000 Lib > /dev/null > ❯ du -h Lib -c --max-depth=0 > 70M Lib > 70M total > > AFTER: > ❯ ./python -m compileall -r 1000 Lib > /dev/null > ❯ du -h Lib -c --max-depth=0 > 76M Lib > 76M total > > So that's an increase of 8.56 % over the original value. This is storing > the start offset and end offset with no compression > whatsoever. > > On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado <pablog...@gmail.com> > wrote: > >> Hi there, >> >> We are preparing a PEP and we would like to start some early discussion >> about one of the main aspects of the PEP. >> >> The work we are preparing is to allow the interpreter to produce more >> fine-grained error messages, pointing to >> the source associated to the instructions that are failing. For example: >> >> Traceback (most recent call last): >> >> File "test.py", line 14, in <module> >> >> lel3(x) >> >> ^^^^^^^ >> >> File "test.py", line 12, in lel3 >> >> return lel2(x) / 23 >> >> ^^^^^^^ >> >> File "test.py", line 9, in lel2 >> >> return 25 + lel(x) + lel(x) >> >> ^^^^^^ >> >> File "test.py", line 6, in lel >> >> return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e) >> >> ^^^^^^^^^^^^^^^^^^^^^ >> >> TypeError: 'NoneType' object is not subscriptable >> >> The cost of this is having the start column number and end column number >> information for every bytecode instruction >> and this is what we want to discuss (there is also some stack cost to >> re-raise exceptions but that's not a big problem in >> any case). Given that column numbers are not very big compared with line >> numbers, we plan to store these as unsigned chars >> or unsigned shorts. We ran some experiments over the standard library and >> we found that the overhead of all pyc files is: >> >> * If we use shorts, the total overhead is ~3% (total size 28MB and the >> extra size is 0.88 MB). >> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the >> extra size is 0.44MB). >> >> One of the disadvantages of using chars is that we can only report >> columns from 1 to 255 so if an error happens in a column >> bigger than that then we would have to exclude it (and not show the >> highlighting) for that frame. Unsigned short will allow >> the values to go from 0 to 65535. >> >> Unfortunately these numbers are not easily compressible, as every >> instruction would have very different offsets. >> >> There is also the possibility of not doing this based on some build flag >> on when using -O to allow users to opt out, but given the fact >> that these numbers can be quite useful to other tools like coverage >> measuring tools, tracers, profilers and the such adding conditional >> logic to many places would complicate the implementation considerably and >> will potentially reduce the usability of those tools so we prefer >> not to have the conditional logic. We believe this is extra cost is very >> much worth the better error reporting but we understand and respect >> other points of view. >> >> Does anyone see a better way to encode this information **without >> complicating a lot the implementation**? What are people thoughts on the >> feature? >> >> Thanks in advance, >> >> Regards from cloudy London, >> Pablo Galindo Salgado >> >>
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RICGTXCABZPK7RLDB7SISR4E64S6FEKR/ Code of Conduct: http://python.org/psf/codeofconduct/