[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-13 Thread Barry


> On 13 May 2021, at 02:09, Mike Miller  wrote:
> 
> 
>> On 2021-05-11 16:12, Guido van Rossum wrote:
>> On Tue, May 11, 2021 at 4:07 PM Gregory P. Smith > There's a difference between tracebacks dumped as plain text (utf-8) by
>>traceback.print_exc() appearing on stderr or directed into log files and
>>what can be displayed within a terminal.  It is highly unusual to emit
>>terminal control characters into log files.
>> And yet it happens all the time. :-( Let's not risk that happening.
> 
> 
> os.isatty() is helpful in that situation, 

Most tools that support colour output allow you to customise the colours
and have a always-colour, never-colour, auto-colour option.

Isatty() is useful for the auto.

Barry

> 
> -Mike
> 
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/QT2CTAPV7BVUHEUR6OY6DKWTNX6WM5MF/
> Code of Conduct: http://python.org/psf/codeofconduct/

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/I5A6JSFVERDC6YG6CNHSJQTY4CDUTSY6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-12 Thread Mike Miller


On 2021-05-11 16:12, Guido van Rossum wrote:
On Tue, May 11, 2021 at 4:07 PM Gregory P. Smith There's a difference between tracebacks dumped as plain text (utf-8) by

traceback.print_exc() appearing on stderr or directed into log files and
what can be displayed within a terminal.  It is highly unusual to emit
terminal control characters into log files.


And yet it happens all the time. :-( Let's not risk that happening.



os.isatty() is helpful in that situation, no?

-Mike

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QT2CTAPV7BVUHEUR6OY6DKWTNX6WM5MF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-11 Thread Guido van Rossum
On Tue, May 11, 2021 at 4:07 PM Gregory P. Smith  wrote:

>
>
> On Tue, May 11, 2021 at 3:33 PM Mike Miller 
> wrote:
>
>>
>> On 5/11/21 1:57 AM, Baptiste Carvello wrote:
>> > Le 11/05/2021 à 09:35, Steven D'Aprano a écrit :
>> >> On Mon, May 10, 2021 at 09:44:05PM -0400, Terry Reedy wrote:
>> >>
>> >>> The vanilla interpreter could be updated to recognize when it is
>> running
>> >>> on a similated 35-year-old terminal that implements ansi-vt100 color
>> >>> codes rather than a similated 40+-year-old black-and-white
>> teletype-like
>> >>> terminal.
>> >>
>> >> This is what is called "scope creep", although in this case
>> >> perhaps "scope gallop" is more appropriate *wink*
>> >> [...]
>> >
>> > Also: people paste tracebacks into issue reports, so all information has
>> > to survive copy-pasting.
>> >
>>
>> The first ANSI standard supported underlined text, didn't it?  The VT100
>> did.
>> That would make it part of the 40+ year old subset from the late 70's.
>>
>> While color might stand out more, underline suits the problem well, also
>> without
>> increasing the line count.
>>
>> There are a number of terminal emulators that support rich text copies,
>> but not
>> all of them.  This is added information however, so it not being
>> copy-pastable
>> everywhere shouldn't be a blocking requirement imho.
>>
>
> fancier REPL frontends have supported things like highlighting and such in
> their tracebacks, I expect they'll adopt column information and render it
> as such.
>
> There's a difference between tracebacks dumped as plain text (utf-8) by
> traceback.print_exc() appearing on stderr or directed into log files and
> what can be displayed within a terminal.  It is highly unusual to emit
> terminal control characters into log files.
>

And yet it happens all the time. :-( Let's not risk that happening.

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZR4VCUXV4GQB2KT2MFWNLJ7I4YXLZGMN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-11 Thread Gregory P. Smith
On Tue, May 11, 2021 at 3:33 PM Mike Miller  wrote:

>
> On 5/11/21 1:57 AM, Baptiste Carvello wrote:
> > Le 11/05/2021 à 09:35, Steven D'Aprano a écrit :
> >> On Mon, May 10, 2021 at 09:44:05PM -0400, Terry Reedy wrote:
> >>
> >>> The vanilla interpreter could be updated to recognize when it is
> running
> >>> on a similated 35-year-old terminal that implements ansi-vt100 color
> >>> codes rather than a similated 40+-year-old black-and-white
> teletype-like
> >>> terminal.
> >>
> >> This is what is called "scope creep", although in this case
> >> perhaps "scope gallop" is more appropriate *wink*
> >> [...]
> >
> > Also: people paste tracebacks into issue reports, so all information has
> > to survive copy-pasting.
> >
>
> The first ANSI standard supported underlined text, didn't it?  The VT100
> did.
> That would make it part of the 40+ year old subset from the late 70's.
>
> While color might stand out more, underline suits the problem well, also
> without
> increasing the line count.
>
> There are a number of terminal emulators that support rich text copies,
> but not
> all of them.  This is added information however, so it not being
> copy-pastable
> everywhere shouldn't be a blocking requirement imho.
>

fancier REPL frontends have supported things like highlighting and such in
their tracebacks, I expect they'll adopt column information and render it
as such.

There's a difference between tracebacks dumped as plain text (utf-8) by
traceback.print_exc() appearing on stderr or directed into log files and
what can be displayed within a terminal.  It is highly unusual to emit
terminal control characters into log files.

-G


>
> -Mike
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/W44D2BWWICNJTWPQOZUWVQEIJ6T3QWYM/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/7AQFQOCZCN44ML3UDY5RNWJJHOEDS4JN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-11 Thread Mike Miller


On 5/11/21 1:57 AM, Baptiste Carvello wrote:

Le 11/05/2021 à 09:35, Steven D'Aprano a écrit :

On Mon, May 10, 2021 at 09:44:05PM -0400, Terry Reedy wrote:


The vanilla interpreter could be updated to recognize when it is running
on a similated 35-year-old terminal that implements ansi-vt100 color
codes rather than a similated 40+-year-old black-and-white teletype-like
terminal.


This is what is called "scope creep", although in this case
perhaps "scope gallop" is more appropriate *wink*
[...]


Also: people paste tracebacks into issue reports, so all information has
to survive copy-pasting.



The first ANSI standard supported underlined text, didn't it?  The VT100 did. 
That would make it part of the 40+ year old subset from the late 70's.


While color might stand out more, underline suits the problem well, also without 
increasing the line count.


There are a number of terminal emulators that support rich text copies, but not 
all of them.  This is added information however, so it not being copy-pastable 
everywhere shouldn't be a blocking requirement imho.


-Mike
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/W44D2BWWICNJTWPQOZUWVQEIJ6T3QWYM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-11 Thread Baptiste Carvello
Le 11/05/2021 à 09:35, Steven D'Aprano a écrit :
> On Mon, May 10, 2021 at 09:44:05PM -0400, Terry Reedy wrote:
> 
>> The vanilla interpreter could be updated to recognize when it is running 
>> on a similated 35-year-old terminal that implements ansi-vt100 color 
>> codes rather than a similated 40+-year-old black-and-white teletype-like 
>> terminal.
> 
> This is what is called "scope creep", although in this case 
> perhaps "scope gallop" is more appropriate *wink*
> [...]

Also: people paste tracebacks into issue reports, so all information has
to survive copy-pasting.

Cheers,
Baptiste
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/U2U44ZB3HFBAD3TGJ5X7Q3P6QCAIRYG2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-11 Thread Steven D'Aprano
On Mon, May 10, 2021 at 09:44:05PM -0400, Terry Reedy wrote:

> The vanilla interpreter could be updated to recognize when it is running 
> on a similated 35-year-old terminal that implements ansi-vt100 color 
> codes rather than a similated 40+-year-old black-and-white teletype-like 
> terminal.

This is what is called "scope creep", although in this case 
perhaps "scope gallop" is more appropriate *wink*

Supporting coloured output out of the box would be nice but if we want 
to do it properly, we would have to support at least ANSI-compatible 
terminals and Windows. And once we support it in tracebacks, you know 
people will say "if Python can print coloured text in a traceback, why 
can't I print coloured text in my own output?" and so that's going to 
rapidly end up needing something like colorama.

-- 
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/2N2IOBTUSCZQVZSCPSDHKBCR5UCKXGC2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-10 Thread Terry Reedy

On 5/10/2021 6:07 AM, Steven D'Aprano wrote:

On Mon, May 10, 2021 at 05:34:12AM -0400, Terry Reedy wrote:

On 5/10/2021 3:28 AM, M.-A. Lemburg wrote:


I'm mostly thinking of tracebacks which go >10 levels deep, which is
rather common in larger applications. For those tracebacks, the top
entries are mostly noise you never look at when debugging. The proposal
now adds another 10 extra lines to jump over :-)


If the slice were instead marked with color tagging, as I hope will be
possible in IDLE and other IDEs, then no extra lines well be needed


That's great for people using IDLE, but for those using the vanilla
Python interpreter, M-A.L makes a good point about increasing the
vertical size of the traceback which will almost always be ignored.


The vanilla interpreter could be updated to recognize when it is running 
on a similated 35-year-old terminal that implements ansi-vt100 color 
codes rather than a similated 40+-year-old black-and-white teletype-like 
terminal.


Making the enhancement available to nonstandard python-coded interfaces 
is a separate issue.



Its especially the case for beginners. Its hard enough to get newbies to
read *any* of the traceback. Anything which increases the visual noise
of that is going to make it harder.



--
Terry Jan Reedy

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MX5MNQN5WFETYXDE3OK5X6I7BEGUQTQ3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-10 Thread Pablo Galindo Salgado
That is going to be very hard to read, unfortunately. Especially when the
line is
not simple. Highlighting the range is quite a fundamental part of the
proposal and
is driven by the great welcoming of highlighting ranges for syntax errors,
which many
users have reached to say that they find it extremely useful as a visual
feature.

Also, people with vision problems have mentioned how important having a
highlighting
section under the code to quickly understand the problem.

On Mon, 10 May 2021 at 11:46, Irit Katriel via Python-Dev <
python-dev@python.org> wrote:

>
> Another alternative is instead of
>
> File blah.py line 3:
> return x/0
>   ^^^
>
> to have
>
> File blah.py line 3 cols 12-14:
>   x/0
>
>
> On Mon, May 10, 2021 at 11:12 AM Steven D'Aprano 
> wrote:
>
>> On Mon, May 10, 2021 at 05:34:12AM -0400, Terry Reedy wrote:
>> > On 5/10/2021 3:28 AM, M.-A. Lemburg wrote:
>> >
>> > >I'm mostly thinking of tracebacks which go >10 levels deep, which is
>> > >rather common in larger applications. For those tracebacks, the top
>> > >entries are mostly noise you never look at when debugging. The proposal
>> > >now adds another 10 extra lines to jump over :-)
>> >
>> > If the slice were instead marked with color tagging, as I hope will be
>> > possible in IDLE and other IDEs, then no extra lines well be needed
>>
>> That's great for people using IDLE, but for those using the vanilla
>> Python interpreter, M-A.L makes a good point about increasing the
>> vertical size of the traceback which will almost always be ignored.
>>
>> Its especially the case for beginners. Its hard enough to get newbies to
>> read *any* of the traceback. Anything which increases the visual noise
>> of that is going to make it harder.
>>
>>
>> --
>> Steve
>> ___
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/3ADVDPF4Z5DMXKG2CMJ3JTIN2SC76AUC/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/YWNDAKU7RYBFWCRMXK3UZEYFE7NLUSNY/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WTWY2SZOKY7Y3MKVCETB4ILIBK7BQNYK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-10 Thread Irit Katriel via Python-Dev
Another alternative is instead of

File blah.py line 3:
return x/0
  ^^^

to have

File blah.py line 3 cols 12-14:
  x/0


On Mon, May 10, 2021 at 11:12 AM Steven D'Aprano 
wrote:

> On Mon, May 10, 2021 at 05:34:12AM -0400, Terry Reedy wrote:
> > On 5/10/2021 3:28 AM, M.-A. Lemburg wrote:
> >
> > >I'm mostly thinking of tracebacks which go >10 levels deep, which is
> > >rather common in larger applications. For those tracebacks, the top
> > >entries are mostly noise you never look at when debugging. The proposal
> > >now adds another 10 extra lines to jump over :-)
> >
> > If the slice were instead marked with color tagging, as I hope will be
> > possible in IDLE and other IDEs, then no extra lines well be needed
>
> That's great for people using IDLE, but for those using the vanilla
> Python interpreter, M-A.L makes a good point about increasing the
> vertical size of the traceback which will almost always be ignored.
>
> Its especially the case for beginners. Its hard enough to get newbies to
> read *any* of the traceback. Anything which increases the visual noise
> of that is going to make it harder.
>
>
> --
> Steve
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/3ADVDPF4Z5DMXKG2CMJ3JTIN2SC76AUC/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YWNDAKU7RYBFWCRMXK3UZEYFE7NLUSNY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-10 Thread Steven D'Aprano
On Mon, May 10, 2021 at 05:34:12AM -0400, Terry Reedy wrote:
> On 5/10/2021 3:28 AM, M.-A. Lemburg wrote:
> 
> >I'm mostly thinking of tracebacks which go >10 levels deep, which is
> >rather common in larger applications. For those tracebacks, the top
> >entries are mostly noise you never look at when debugging. The proposal
> >now adds another 10 extra lines to jump over :-)
> 
> If the slice were instead marked with color tagging, as I hope will be 
> possible in IDLE and other IDEs, then no extra lines well be needed

That's great for people using IDLE, but for those using the vanilla 
Python interpreter, M-A.L makes a good point about increasing the 
vertical size of the traceback which will almost always be ignored.

Its especially the case for beginners. Its hard enough to get newbies to 
read *any* of the traceback. Anything which increases the visual noise 
of that is going to make it harder.


-- 
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/3ADVDPF4Z5DMXKG2CMJ3JTIN2SC76AUC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-10 Thread Terry Reedy

On 5/10/2021 3:28 AM, M.-A. Lemburg wrote:


I'm mostly thinking of tracebacks which go >10 levels deep, which is
rather common in larger applications. For those tracebacks, the top
entries are mostly noise you never look at when debugging. The proposal
now adds another 10 extra lines to jump over :-)


If the slice were instead marked with color tagging, as I hope will be 
possible in IDLE and other IDEs, then no extra lines well be needed


--
Terry Jan Reedy

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/RPAFKIZOC2LQY57S4TK4P5O527RGCGPK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-10 Thread Henk-Jaap Wagenaar
On Mon, 10 May 2021 at 08:34, M.-A. Lemburg  wrote:

> [...]

PS: It looks like the discussion has wondered off to Discourse
> now. Should we continue there ?
>
> --
> Marc-Andre Lemburg
> eGenix.com
>

Pablo seems to want to redirect the discussion there yes, in particular to:

https://discuss.python.org/t/pep-657-include-fine-grained-error-locations-in-tracebacks/8629

On Sun, 9 May 2021 at 16:25, Pablo Galindo Salgado 
wrote:

> [...]
> The discussion is happening in the discourse server:
>
>
> https://discuss.python.org/t/pep-657-include-fine-grained-error-locations-in-tracebacks/8629
>
> To avoid splitting the discussion, *please redirect your comments there*
> instead of replying to this thread.
>
> Thanks!
>
> Regards from sunny London,
> Pablo Galindo Salgado
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/6JBZUB3ABNZYMSDKDED2V2SYFP73QZV6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-10 Thread M.-A. Lemburg
On 09.05.2021 14:22, Larry Hastings wrote:
> On 5/9/21 3:00 AM, M.-A. Lemburg wrote:
>> BTW: For better readability, I'd also not output the  lines
>> for every stack level in the traceback, but just the last one,
>> since it's usually clear where the call to the next stack
>> level happens in the upper ones.
> 
> 
> Playing devil's advocate: in the un-usual case, where it may be ambiguous 
> where
> the call came from, outputting the  lines could be a real life-saver.
> 
> I concede this is rare,

I'm mostly thinking of tracebacks which go >10 levels deep, which is
rather common in larger applications. For those tracebacks, the top
entries are mostly noise you never look at when debugging. The proposal
now adds another 10 extra lines to jump over :-)

PS: It looks like the discussion has wondered off to Discourse
now. Should we continue there ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, May 10 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/RJ5M5DBEGAXUB4WVG5RKLFAZGOZEZ6G4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-09 Thread Gregory P. Smith
On Sun, May 9, 2021 at 9:13 AM Antoine Pitrou  wrote:

> On Sun, 09 May 2021 02:16:02 -
> "Jim J. Jewett"  wrote:
> > Antoine Pitrou wrote:
> > > On Sat, 8 May 2021 02:58:40 +
> > > Neil Schemenauer nas-pyt...@arctrix.com wrote:
> >
> > > > It would be cool if we could mmap the pyc files and have the VM run
> > > > code without an unmarshal step.
> > > > What happens if another process mutates or truncates the file while
> the
> > > CPython VM is executing code from the mapped file?  Crash?
> >
> > Why would this be any different than whatever happens now?
>
> What happens now is that the pyc file is transferred at once to memory
> using regular IO.  So the chance is really slim that you read invalid
> data due to concurrent mutation.
>

concurrent mutation isn't even what I was talking about.  We don't protect
against that today as that isn't a concern.  But POSIX semantics on the
bulk of systems where this would ever matter do software updates by moving
new files into place.  Because that is an idempotent inode change.  So the
existing open file already in the process of being read is not changed.
But as soon as you do a new open call on the pathname you get a different
file than the last time that path was opened.

This is not theoretical.  I've seen production problems as a result
(zipimport - https://bugs.python.org/issue19081) making the incorrect
assumption that they can reopen a file that they've read once at a later
point in time.  So if we do open files later, we must code defensively and
assume they might not contain what we thought.

We already have this problem with source code lines displayed in tracebacks
today as those are read on demand.  But as that is debugging information
only the wrong source lines being shown next to the filename +
linenumber in a traceback is something people just learn to ignore in these
situations.  We have the data to prevent this, we just never have.
https://bugs.python.org/issue44091 filed to track that.

Given this context, M.-A. Lemburg's alternative idea could have some merit
as it would synchronize our source skew behavior with our additional
debugging information behavior.  My initial reaction is that it's falling
into the trap of bundling too into one place though.

quoting M.-A. Lemburg:
> Create a new file format which supports enhanced debugging. This
> would include the source code in a indexed format, the AST and
> mappings between byte code, AST node, lines and columns.
>
> Python would then only use and load this file when it needs
> to print a traceback - much like it does today with the source
> code.
>
> The advantage is that you can add even more useful information
> for debugging while not making the default code distribution
> format take more memory (both disk and RAM).

Realistically: This is going to take more disk space in the common case
because in addition to the py, pyc, pyc.opt-1, pyc.opt-2 that some distros
apparently include all of today, there'd be a new pyc.debuginfo to go along
side it. The only benefit is that it isn't resident in ram. And someone
*could* choose to filter these out of their distro or container or
whatever-the-heck-their-package-format-is. But I really doubt that'll be
the default.

Not having debugging information when a problem you're trying to hunt down
and reproduce but only happens once in a blue moon is extraordinarily
frustrating.  Which is why people who value engineering time deploy with
debugging info.

There are environments where people intentionally do not deploy source
code.  But do want to get debugging data from tracebacks that they can then
correlate to their sources later for analysis (they're tracking exactly
which versions of pycs from which versions of sources were deployed).  It'd
be a shame to exclude column information for this scenario.

-gps
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/6E7UZ5SFUAADUJUQ6DKPJIGO6CCGCNFU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-09 Thread Antoine Pitrou
On Sat, 8 May 2021 21:59:49 +0100
Pablo Galindo Salgado  wrote:
> > That could work, but in my personal opinion, I would prefer not to do  
> that as it complicates things and I think is overkill.
> 
> Let me expand on this:
> 
> I recognize the problem that -OO can be quite unusable if some of your
> dependencies depend on docstrings and that It would be good to separate
> this from that option, but I am afraid of the following:
> 
> - New APIs in the marshal module and other places to pass down the extra
> information to read/write or not the extra information.
> - Complication of the pyc format with more entries in the header.
> - Complication of the implementation.
> 
> Given that the reasons to deactivate this option exist, but I expect them
> to be very rare, I would prefer to maximize simplicity and maintainability.

Agreed with Pablo.  Also, once we add a configuration option it becomes
delicate to later remove it.

Regards

Antoine.


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NUOZW4KRZWPJS4Y77DKD3ZJN24YBHAJ4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-09 Thread Antoine Pitrou
On Sun, 09 May 2021 02:16:02 -
"Jim J. Jewett"  wrote:
> Antoine Pitrou wrote:
> > On Sat, 8 May 2021 02:58:40 +
> > Neil Schemenauer nas-pyt...@arctrix.com wrote:  
> 
> > > It would be cool if we could mmap the pyc files and have the VM run
> > > code without an unmarshal step.
> > > What happens if another process mutates or truncates the file while the  
> > CPython VM is executing code from the mapped file?  Crash?  
> 
> Why would this be any different than whatever happens now?

What happens now is that the pyc file is transferred at once to memory
using regular IO.  So the chance is really slim that you read invalid
data due to concurrent mutation.

Regards

Antoine.


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/JRK7RPRG2ENUJRTNJU3PD47QSLVOSVXN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-09 Thread Ethan Furman

On 5/9/21 3:00 AM, M.-A. Lemburg wrote:

> BTW: For better readability, I'd also not output the  lines
> for every stack level in the traceback, but just the last one,
> since it's usually clear where the call to the next stack
> level happens in the upper ones.

Usually, sure -- but in the unusual case those carets at every level can save a lot of time and frustration, and that is 
the goal of this enhancement, yes?  I know I have experienced that ambiguity more than once.


--
~Ethan~
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NQPGKRKGOECYWHB2FOOHXEMPV3XTO6CU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-09 Thread Larry Hastings

On 5/9/21 3:00 AM, M.-A. Lemburg wrote:

BTW: For better readability, I'd also not output the  lines
for every stack level in the traceback, but just the last one,
since it's usually clear where the call to the next stack
level happens in the upper ones.



Playing devil's advocate: in the un-usual case, where it may be 
ambiguous where the call came from, outputting the  lines could be a 
real life-saver.


I concede this is rare,


//arry/

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OTTSTWR4I7AG4J7RJX4PLCGCZO6DVOMZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-09 Thread M.-A. Lemburg
On 08.05.2021 23:55, Gregory P. Smith wrote:
> Who non-hypothetically cares about a 22% pyc file size increase?  I don't 
> think
> we should be concerned.  I'm in favor of always writing them and the 20% size
> increase that results in.  If pyc size is an issue that should be its own
> separate enhancement PEP.  When it comes to pyc files there is more data we 
> may
> want to store in the future for performance reasons - I don't see them 
> shrinking
> without an independent effort.
>
> Caring about additional data retained in memory at runtime makes more sense to
> me as ram cost is much greater than storage cost and is paid repeatedly per
> process.  Storing an additional reference to None on code objects where a 
> column
> information table is perfectly fine.  That can be a -X style interpreter 
> startup
> option.  It isn't something that needs to impacted by the pyc files.  Pass 
> that
> option to the interpreter, and it just discards column info tables on code
> objects after loading them or compiling them.  If people want to optimize for 
> a
> shared pyc situation with memory mapping techniques, that is also something 
> that
> should be a separate enhancement PEP and not involved here.  People writing 
> code
> to use the column information should always check it for None first, that'd be
> something we document with the new feature.

I do care about both the increase in PYC size as well as the increase in
memory usage. When using Python in containers both are relevant and
so I'd like an option to switch this whole mechanism off that's
independent from optimization settings.

This idea is more about debugging during development and doesn't really
have much to do with optimization used for production use of Python,
so a separate flag or perhaps use of -v would the more intuitive
approach.

Alternative idea:

Create a new file format which supports enhanced debugging. This
would include the source code in a indexed format, the AST and
mappings between byte code, AST node, lines and columns.

Python would then only use and load this file when it needs
to print a traceback - much like it does today with the source
code.

The advantage is that you can add even more useful information
for debugging while not making the default code distribution
format take more memory (both disk and RAM).

BTW: For better readability, I'd also not output the  lines
for every stack level in the traceback, but just the last one,
since it's usually clear where the call to the next stack
level happens in the upper ones.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, May 09 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VF3OFSOJIS2PDCGJM2UCELKDHUWYM2HR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Richard Damon
On 5/8/21 10:16 PM, Jim J. Jewett wrote:
> Antoine Pitrou wrote:
>> On Sat, 8 May 2021 02:58:40 +
>> Neil Schemenauer nas-pyt...@arctrix.com wrote:
>>> It would be cool if we could mmap the pyc files and have the VM run
>>> code without an unmarshal step.
>>> What happens if another process mutates or truncates the file while the
>> CPython VM is executing code from the mapped file?  Crash?
> Why would this be any different than whatever happens now?  Just because it 
> is easier for another process to get (exclusive) access to the file if there 
> is a longer delay between loading the first part of the file and going back 
> for the docstrings and lnotab?
>
> -jJ

I think the issue being pointed out is that currently, when Python opens
the .pyc file for reading and keeps the file handle open, that will
block any other process from opening the file for writing, and thus
can't change the contents under it. Once it is all done, it can release
the lock as it won't need to read it again.

if it mapped the file into its address space, it would need a similar
sort of lock, but need to keep if for the FULL execution of the program,
so that no other process could change the contents behind its back. I
think normal mmapping doesn't do this, but if that sort of lock is
available, then it probably should be used.

-- 
Richard Damon

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QQHLCQC4UUWLM6HWHGOJ5SYCGBOO2LNS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Jim J. Jewett
Antoine Pitrou wrote:
> On Sat, 8 May 2021 02:58:40 +
> Neil Schemenauer nas-pyt...@arctrix.com wrote:

> > It would be cool if we could mmap the pyc files and have the VM run
> > code without an unmarshal step.
> > What happens if another process mutates or truncates the file while the
> CPython VM is executing code from the mapped file?  Crash?

Why would this be any different than whatever happens now?  Just because it is 
easier for another process to get (exclusive) access to the file if there is a 
longer delay between loading the first part of the file and going back for the 
docstrings and lnotab?

-jJ
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ONMS26WLPIT35H5VX4Z6STPYWSXXBQVJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Steven D'Aprano
Hi Chris,

On Fri, May 07, 2021 at 07:13:16PM -0700, Chris Jerdonek wrote:

> I'm not sure why you're sounding so negative. Pablo asked for ideas in his
> first message to the list:

I know that Pablo asked for ideas, but that doesn't mean that we are 
obliged to agree with every idea. This is a discussion list which 
means we discuss ideas, both to agree and disagree.

I don't think I'm being negative. I'm very positive about this proposal, 
and I don't want to see it get bogged down with bike-shedding about the 
precise compression/encoding algorithm used.

If Pablo, or any other volunteer such as yourself, wants to go down that 
track to investigate the data distribution, I'm not going to tell them 
that they must not. Go for it! But I'd rather not make this a mandatory 
prerequisite for the PEP.


[...]
> my reply wasn't about the pyc files on disk but about their representation
> in memory, which Pablo later said may be the main concern. So it's not
> compression algorithms like LZ4 so much as a method of encoding.

Okay, thanks for the clarification.


-- 
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4UQH3R44ZOBBKOAAY2KV2PKDCMMSGRQN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Brett Cannon
On Sat, May 8, 2021 at 2:59 PM Gregory P. Smith  wrote:

>
>
> On Sat, May 8, 2021 at 2:09 PM Pablo Galindo Salgado 
> wrote:
>
>> > Why not put in it -O instead?  Then -O means lose asserts and lose
>> fine-grained tracebacks, while -OO continues to also
>> strip out doc strings.
>>
>> What if someone wants to keep asserts but do not want the extra data?
>>
>
> exactly my theme.  our existing -O and -OO already don't serve all user
> needs.  (I've witnessed people who need asserts but don't want docstrings
> wasting ram jump through hacky hoops to do that).  Complicating these
> options more by combining additional actions on them them doesn't help.
>
> The reason we have -O and -OO generate their own special opt-1 and opt-2
> pyc files is because both of those change the generated bytecode and
> overall flow of the program by omitting instructions and data.  code using
> those will run slightly faster as there are fewer instructions.
>
> The change we're talking about here doesn't do that.  It just adds
> additional metadata to whatever instructions are generated.  So it doesn't
> feel -O related.
>

While I'm the opposite.  Metadata that is not necessary for CPython to
function and whose primary driver is better exception tracebacks totally
falls into the same camp as "I don't need docstrings" to me.


>
> While some people aren't going to like the overhead, I'm happy not
> offering the choice.
>
> > Greg, what do you think if instead of not writing it to the pyc file
> with -OO or adding a header entry to decide to read/write, we place None in
> the field? That way we can
> > leverage the option that we intend to add to deactivate displaying the
> traceback new information to reduce the data in the pyc files. The only
> problem
> > is that there will be still a tiny bit of overhead: an extra object per
> code object (None), but that's much much better than something that scales
> with the
> > number of instructions :)
> >
> > What's your opinion on this?
>
> I don't understand the pyc structure enough to comment on how that works,
>

Code to read a .pyc file and use it:
https://github.com/python/cpython/blob/a0bd9e9c11f5f52c7ddd19144c8230da016b53c6/Lib/importlib/_bootstrap_external.py#L951-L1015
(I'd explain more but it is the weekend and I technically shouldn't be
reading this thread ).

-Brett


> but that sounds fine from a way to store less data if these are stored as
> a side table rather than intermingled with each instruction itself.  *If
> anyone even cares about storing less data.*  I would not activate
> generation of that in py_compile and compileall based on the -X flag to
> disable display of tracebacks though.  A flag changing a setting of the
> current runtime regarding traceback printing detail level should not change
> the metadata in pyc files it emits.  I realize -O and -OO behave this way,
> but I don't view those as a great example. We're not writing new uniquely
> named pyc files, I suggest making this an explicit option for py_compile
> and compileall if we're going to support generation of pyc files without
> column data at all.
>
> I'm unclear on what the specific goals are with all of these option
> possibilities.
>
> Who non-hypothetically cares about a 22% pyc file size increase?  I don't
> think we should be concerned.  I'm in favor of always writing them and the
> 20% size increase that results in.  If pyc size is an issue that should be
> its own separate enhancement PEP.  When it comes to pyc files there is more
> data we may want to store in the future for performance reasons - I don't
> see them shrinking without an independent effort.
>
> Caring about additional data retained in memory at runtime makes more
> sense to me as ram cost is much greater than storage cost and is paid
> repeatedly per process.  Storing an additional reference to None on code
> objects where a column information table is perfectly fine.  That can be a
> -X style interpreter startup option.  It isn't something that needs to
> impacted by the pyc files.  Pass that option to the interpreter, and it
> just discards column info tables on code objects after loading them or
> compiling them.  If people want to optimize for a shared pyc situation with
> memory mapping techniques, that is also something that should be a separate
> enhancement PEP and not involved here.  People writing code to use the
> column information should always check it for None first, that'd be
> something we document with the new feature.
>
> -gps
>
>
>>
>> On Sat, 8 May 2021 at 22:05, Ethan Furman  wrote:
>>
>>> On 5/8/21 1:31 PM, Pablo Galindo Salgado wrote:
>>>  >> We can't piggy back on -OO as the only way to disable this, it needs
>>> to
>>>  >> have an option of its own.  -OO is unusable as code that relies on
>>> "doc"
>>>  >> strings as application data such as
>>> http://www.dabeaz.com/ply/ply.html
>>>  >> exists.
>>>  >
>>>  > -OO is the only sensible way to disable the data. There are two
>>> things to disable:

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Pablo Galindo Salgado
Thanks Greg for the great, detailed response

I think I understand now better your proposal and I think is a good idea
and I would like to explore that. I have some questions:

* One problem I see is that that will make the constructor of the code
object depend on global options in the interpreter. Someone using the C-API
and passing down that attribute will be surprised to find that it was
modified by a global. I am not saying is bad but I can see some problems
with that.

* The alternative is to modify all calls to the code object constructor.
This is easy to do in the compiler because code objects are constructed
very close where the meta data is crated but is going to be a pain in other
places, because the code objects are constructed in places where we would
either need new APIs or to hide global state as the previous point.

* Another alternative is to walk the graph and strip the fields but that's
going to have a performance impact.

I think that if we decide to offer an opt out, this is actually one of the
best options but I am still slightly concerned about the extra complexity,
potential new APIs and maintainability.



On Sat, 8 May 2021, 22:55 Gregory P. Smith,  wrote:

>
>
> On Sat, May 8, 2021 at 2:09 PM Pablo Galindo Salgado 
> wrote:
>
>> > Why not put in it -O instead?  Then -O means lose asserts and lose
>> fine-grained tracebacks, while -OO continues to also
>> strip out doc strings.
>>
>> What if someone wants to keep asserts but do not want the extra data?
>>
>
> exactly my theme.  our existing -O and -OO already don't serve all user
> needs.  (I've witnessed people who need asserts but don't want docstrings
> wasting ram jump through hacky hoops to do that).  Complicating these
> options more by combining additional actions on them them doesn't help.
>
> The reason we have -O and -OO generate their own special opt-1 and opt-2
> pyc files is because both of those change the generated bytecode and
> overall flow of the program by omitting instructions and data.  code using
> those will run slightly faster as there are fewer instructions.
>
> The change we're talking about here doesn't do that.  It just adds
> additional metadata to whatever instructions are generated.  So it doesn't
> feel -O related.
>
> While some people aren't going to like the overhead, I'm happy not
> offering the choice.
>
> > Greg, what do you think if instead of not writing it to the pyc file
> with -OO or adding a header entry to decide to read/write, we place None in
> the field? That way we can
> > leverage the option that we intend to add to deactivate displaying the
> traceback new information to reduce the data in the pyc files. The only
> problem
> > is that there will be still a tiny bit of overhead: an extra object per
> code object (None), but that's much much better than something that scales
> with the
> > number of instructions :)
> >
> > What's your opinion on this?
>
> I don't understand the pyc structure enough to comment on how that works,
> but that sounds fine from a way to store less data if these are stored as a
> side table rather than intermingled with each instruction itself.  *If
> anyone even cares about storing less data.*  I would not activate
> generation of that in py_compile and compileall based on the -X flag to
> disable display of tracebacks though.  A flag changing a setting of the
> current runtime regarding traceback printing detail level should not change
> the metadata in pyc files it emits.  I realize -O and -OO behave this way,
> but I don't view those as a great example. We're not writing new uniquely
> named pyc files, I suggest making this an explicit option for py_compile
> and compileall if we're going to support generation of pyc files without
> column data at all.
>
> I'm unclear on what the specific goals are with all of these option
> possibilities.
>
> Who non-hypothetically cares about a 22% pyc file size increase?  I don't
> think we should be concerned.  I'm in favor of always writing them and the
> 20% size increase that results in.  If pyc size is an issue that should be
> its own separate enhancement PEP.  When it comes to pyc files there is more
> data we may want to store in the future for performance reasons - I don't
> see them shrinking without an independent effort.
>
> Caring about additional data retained in memory at runtime makes more
> sense to me as ram cost is much greater than storage cost and is paid
> repeatedly per process.  Storing an additional reference to None on code
> objects where a column information table is perfectly fine.  That can be a
> -X style interpreter startup option.  It isn't something that needs to
> impacted by the pyc files.  Pass that option to the interpreter, and it
> just discards column info tables on code objects after loading them or
> compiling them.  If people want to optimize for a shared pyc situation with
> memory mapping techniques, that is also something that should be a separate
> enhancement PEP and 

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Gregory P. Smith
On Sat, May 8, 2021 at 2:40 PM Jonathan Goble  wrote:

> On Sat, May 8, 2021 at 5:08 PM Pablo Galindo Salgado 
> wrote:
>
>> > Why not put in it -O instead?  Then -O means lose asserts and lose
>> fine-grained tracebacks, while -OO continues to also
>> strip out doc strings.
>>
>> What if someone wants to keep asserts but do not want the extra data?
>>
>
> What if I want to keep asserts and docstrings but don't want the extra
> data?
>
> Or actually, consider this. I *need* to keep asserts (because rightly or
> wrongly, I have a dependency, or my own code, that relies on them), but I
> *don't* want docstrings (because they're huge and I don't want the overhead
> in production), and I *don't* want the extra data in production either.
>
> Now what?
>
> I think what this illustrates is that the entire concept of optimizations
> in Python needs a complete rethink. It's already fundamentally broken for
> someone who wants to keep asserts but remove docstrings. Adding a third
> layer to this is a perfect opportunity to reconsider the whole paradigm.
>

Reconsidering "the whole paradigm" is always possible, but is a much larger
effort. It should not be something that blocks this enhancement from
happening.

We have discussed the -O mess before, on list and at summits and sprints.
-OO and the __pycache__ and longer .pyc names and versioned names were
among the results of that.  But we opted not to try and make life even more
complicated by expanding the test matrix of possible generated bytecode
even larger.

I'm getting off-topic here, and this should probably be a thread of its
> own, but perhaps what we should introduce is a compiler directive, similar
> to future statements but not that, that one can place at the top of a
> source file to tell the compiler "this file depends on asserts, don't
> optimize them out". Same for each thing that can be optimized that has a
> runtime behavior effect, including docstrings.
>

This idea has merit.  Worth keeping in mind for the future.  But agreed,
this goes beyond this threads topic so I'll leave it at that.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PCZGEWFIPS2YPMJWTILVANJYT6VWS27B/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Gregory P. Smith
On Sat, May 8, 2021 at 2:09 PM Pablo Galindo Salgado 
wrote:

> > Why not put in it -O instead?  Then -O means lose asserts and lose
> fine-grained tracebacks, while -OO continues to also
> strip out doc strings.
>
> What if someone wants to keep asserts but do not want the extra data?
>

exactly my theme.  our existing -O and -OO already don't serve all user
needs.  (I've witnessed people who need asserts but don't want docstrings
wasting ram jump through hacky hoops to do that).  Complicating these
options more by combining additional actions on them them doesn't help.

The reason we have -O and -OO generate their own special opt-1 and opt-2
pyc files is because both of those change the generated bytecode and
overall flow of the program by omitting instructions and data.  code using
those will run slightly faster as there are fewer instructions.

The change we're talking about here doesn't do that.  It just adds
additional metadata to whatever instructions are generated.  So it doesn't
feel -O related.

While some people aren't going to like the overhead, I'm happy not offering
the choice.

> Greg, what do you think if instead of not writing it to the pyc file with
-OO or adding a header entry to decide to read/write, we place None in the
field? That way we can
> leverage the option that we intend to add to deactivate displaying the
traceback new information to reduce the data in the pyc files. The only
problem
> is that there will be still a tiny bit of overhead: an extra object per
code object (None), but that's much much better than something that scales
with the
> number of instructions :)
>
> What's your opinion on this?

I don't understand the pyc structure enough to comment on how that works,
but that sounds fine from a way to store less data if these are stored as a
side table rather than intermingled with each instruction itself.  *If
anyone even cares about storing less data.*  I would not activate
generation of that in py_compile and compileall based on the -X flag to
disable display of tracebacks though.  A flag changing a setting of the
current runtime regarding traceback printing detail level should not change
the metadata in pyc files it emits.  I realize -O and -OO behave this way,
but I don't view those as a great example. We're not writing new uniquely
named pyc files, I suggest making this an explicit option for py_compile
and compileall if we're going to support generation of pyc files without
column data at all.

I'm unclear on what the specific goals are with all of these option
possibilities.

Who non-hypothetically cares about a 22% pyc file size increase?  I don't
think we should be concerned.  I'm in favor of always writing them and the
20% size increase that results in.  If pyc size is an issue that should be
its own separate enhancement PEP.  When it comes to pyc files there is more
data we may want to store in the future for performance reasons - I don't
see them shrinking without an independent effort.

Caring about additional data retained in memory at runtime makes more sense
to me as ram cost is much greater than storage cost and is paid repeatedly
per process.  Storing an additional reference to None on code objects where
a column information table is perfectly fine.  That can be a -X style
interpreter startup option.  It isn't something that needs to impacted by
the pyc files.  Pass that option to the interpreter, and it just discards
column info tables on code objects after loading them or compiling them.
If people want to optimize for a shared pyc situation with memory mapping
techniques, that is also something that should be a separate enhancement
PEP and not involved here.  People writing code to use the column
information should always check it for None first, that'd be something we
document with the new feature.

-gps


>
> On Sat, 8 May 2021 at 22:05, Ethan Furman  wrote:
>
>> On 5/8/21 1:31 PM, Pablo Galindo Salgado wrote:
>>  >> We can't piggy back on -OO as the only way to disable this, it needs
>> to
>>  >> have an option of its own.  -OO is unusable as code that relies on
>> "doc"
>>  >> strings as application data such as
>> http://www.dabeaz.com/ply/ply.html
>>  >> exists.
>>  >
>>  > -OO is the only sensible way to disable the data. There are two things
>> to disable:
>>  >
>>  > * The data in pyc files
>>  > * Printing the exception highlighting
>>
>> Why not put in it -O instead?  Then -O means lose asserts and lose
>> fine-grained tracebacks, while -OO continues to also
>> strip out doc strings.
>>
>> --
>> ~Ethan~
>> ___
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/BEE4BGUZHXBTVDPOW5R4DC3S463XC3EJ/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> 

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Jonathan Goble
On Sat, May 8, 2021 at 5:08 PM Pablo Galindo Salgado 
wrote:

> > Why not put in it -O instead?  Then -O means lose asserts and lose
> fine-grained tracebacks, while -OO continues to also
> strip out doc strings.
>
> What if someone wants to keep asserts but do not want the extra data?
>

What if I want to keep asserts and docstrings but don't want the extra data?

Or actually, consider this. I *need* to keep asserts (because rightly or
wrongly, I have a dependency, or my own code, that relies on them), but I
*don't* want docstrings (because they're huge and I don't want the overhead
in production), and I *don't* want the extra data in production either.

Now what?

I think what this illustrates is that the entire concept of optimizations
in Python needs a complete rethink. It's already fundamentally broken for
someone who wants to keep asserts but remove docstrings. Adding a third
layer to this is a perfect opportunity to reconsider the whole paradigm.

I'm getting off-topic here, and this should probably be a thread of its
own, but perhaps what we should introduce is a compiler directive, similar
to future statements but not that, that one can place at the top of a
source file to tell the compiler "this file depends on asserts, don't
optimize them out". Same for each thing that can be optimized that has a
runtime behavior effect, including docstrings. This would be minimally
disruptive since we can then stay at only two optimization levels and put
column info at whichever level we feel makes sense, but (provided the
compiler directives are used properly) information a particular file
requires to function correctly will never be removed from that file even if
the process-wide optimization level calls for it. I see no reason code with
asserts in one file and optimized code without asserts in another file
can't interact, and no reason code with docstrings and optimized code
without docstrings can't interact. Soft keywords would make this compiler
directive much easier, as it doesn't have to be shoehorned into the import
syntax (to suggest a bikeshed color, perhaps "retain asserts, docstrings"?)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QV7LVUKWC72XA23NBZMFA573V7HPU72Y/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Pablo Galindo Salgado
> I don't think the optional existence of column number information needs a
different kind of pyc file.  Just a flag in a pyc file's header at most.
It isn't a new type of file.

Greg, what do you think if instead of not writing it to the pyc file with
-OO or adding a header entry to decide to read/write, we place None in the
field? That way we can
leverage the option that we intend to add to deactivate displaying the
traceback new information to reduce the data in the pyc files. The only
problem
is that there will be still a tiny bit of overhead: an extra object per
code object (None), but that's much much better than something that scales
with the
number of instructions :)

What's your opinion on this?


On Sat, 8 May 2021 at 21:45, Gregory P. Smith  wrote:

>
> On Sat, May 8, 2021 at 1:32 PM Pablo Galindo Salgado 
> wrote:
>
>> > We can't piggy back on -OO as the only way to disable this, it needs
>> to have an option of its own.  -OO is unusable as code that relies on
>> "doc"strings as application data such as
>> http://www.dabeaz.com/ply/ply.html exists.
>>
>> -OO is the only sensible way to disable the data. There are two things to
>> disable:
>>
>
> nit: I wouldn't choose the word "sensible" given that -OO is already
> fundamentally unusable without knowing if any code in your entire
> transitive dependencies might depend on the presence of docstrings...
>
>
>>
>> * The data in pyc files
>> * Printing the exception highlighting
>>
>> Printing the exception highlighting can be disabled via combo of
>> environment variable / -X option but collecting the data can only be
>> disabled by -OO. The reason is that this will end in pyc files
>> so when the data is not there, a different kind of pyc files need to be
>> produced and I really don't want to have another set of pyc file extension
>> just to deactivate this. Notice that also a configure
>> time variable won't work because it will cause crashes when reading pyc
>> files produced by the interpreter compiled without the flag.
>>
>
> I don't think the optional existence of column number information needs a
> different kind of pyc file.  Just a flag in a pyc file's header at most.
> It isn't a new type of file.
>
>
>> On Sat, 8 May 2021 at 21:13, Gregory P. Smith  wrote:
>>
>>>
>>>
>>> On Sat, May 8, 2021 at 11:58 AM Pablo Galindo Salgado <
>>> pablog...@gmail.com> wrote:
>>>
 Hi Brett,

 Just to be clear, .pyo files have not existed for a while:
> https://www.python.org/dev/peps/pep-0488/.


 Whoops, my bad, I wanted to refer to the pyc files that are generated
 with -OO, which have the "opt-2" prefix.

 This only kicks in at the -OO level.


 I will correct the PEP so it reflex this more exactly.

 I personally prefer the idea of dropping the data with -OO since if
> you're stripping out docstrings you're already hurting introspection
> capabilities in the name of memory. Or one could go as far as to introduce
> -Os to do -OO plus dropping this extra data.


 This is indeed the plan, sorry for the confusion. The opt-out mechanism
 is using -OO, precisely as we are already dropping other data.

>>>
>>> We can't piggy back on -OO as the only way to disable this, it needs to
>>> have an option of its own.  -OO is unusable as code that relies on
>>> "doc"strings as application data such as
>>> http://www.dabeaz.com/ply/ply.html exists.
>>>
>>> -gps
>>>
>>>

 Thanks for the clarifications!



 On Sat, 8 May 2021 at 19:41, Brett Cannon  wrote:

>
>
> On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado <
> pablog...@gmail.com> wrote:
>
>> Although we were originally not sympathetic with it, we may need to
>> offer an opt-out mechanism for those users that care about the impact of
>> the overhead of the new data in pyc files
>> and in in-memory code objectsas was suggested by some folks (Thomas,
>> Yury, and others). For this, we could propose that the functionality will
>> be deactivated along with the extra
>> information when Python is executed in optimized mode (``python -O``)
>> and therefore pyo files will not have the overhead associated with the
>> extra required data.
>>
>
> Just to be clear, .pyo files have not existed for a while:
> https://www.python.org/dev/peps/pep-0488/.
>
>
>> Notice that Python
>> already strips docstrings in this mode so it would be "aligned" with
>> the current mechanism of optimized mode.
>>
>
> This only kicks in at the -OO level.
>
>
>>
>> Although this complicates the implementation, it certainly is still
>> much easier than dealing with compression (and more useful for those that
>> don't want the feature). Notice that we also
>> expect pessimistic results from compression as offsets would be quite
>> random (although predominantly in the range 10 - 120).
>>

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Pablo Galindo Salgado
> Why not put in it -O instead?  Then -O means lose asserts and lose
fine-grained tracebacks, while -OO continues to also
strip out doc strings.

What if someone wants to keep asserts but do not want the extra data?

On Sat, 8 May 2021 at 22:05, Ethan Furman  wrote:

> On 5/8/21 1:31 PM, Pablo Galindo Salgado wrote:
>  >> We can't piggy back on -OO as the only way to disable this, it needs to
>  >> have an option of its own.  -OO is unusable as code that relies on
> "doc"
>  >> strings as application data such as http://www.dabeaz.com/ply/ply.html
>  >> exists.
>  >
>  > -OO is the only sensible way to disable the data. There are two things
> to disable:
>  >
>  > * The data in pyc files
>  > * Printing the exception highlighting
>
> Why not put in it -O instead?  Then -O means lose asserts and lose
> fine-grained tracebacks, while -OO continues to also
> strip out doc strings.
>
> --
> ~Ethan~
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/BEE4BGUZHXBTVDPOW5R4DC3S463XC3EJ/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WK4KXZPOSWYMI3C5AILQCEYVZRCDFL7N/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Ethan Furman

On 5/8/21 1:31 PM, Pablo Galindo Salgado wrote:
>> We can't piggy back on -OO as the only way to disable this, it needs to
>> have an option of its own.  -OO is unusable as code that relies on "doc"
>> strings as application data such as http://www.dabeaz.com/ply/ply.html
>> exists.
>
> -OO is the only sensible way to disable the data. There are two things to 
disable:
>
> * The data in pyc files
> * Printing the exception highlighting

Why not put in it -O instead?  Then -O means lose asserts and lose fine-grained tracebacks, while -OO continues to also 
strip out doc strings.


--
~Ethan~
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BEE4BGUZHXBTVDPOW5R4DC3S463XC3EJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Pablo Galindo Salgado
> That could work, but in my personal opinion, I would prefer not to do
that as it complicates things and I think is overkill.

Let me expand on this:

I recognize the problem that -OO can be quite unusable if some of your
dependencies depend on docstrings and that It would be good to separate
this from that option, but I am afraid of the following:

- New APIs in the marshal module and other places to pass down the extra
information to read/write or not the extra information.
- Complication of the pyc format with more entries in the header.
- Complication of the implementation.

Given that the reasons to deactivate this option exist, but I expect them
to be very rare, I would prefer to maximize simplicity and maintainability.

On Sat, 8 May 2021 at 21:50, Pablo Galindo Salgado 
wrote:

> > I don't think the optional existence of column number information needs
> a different kind of pyc file.  Just a flag in a pyc file's header at most.
> It isn't a new type of file.
>
> That could work, but in my personal opinion, I would prefer not to do that
> as it complicates things and I think is overkill.
>
> On Sat, 8 May 2021 at 21:45, Gregory P. Smith  wrote:
>
>>
>> On Sat, May 8, 2021 at 1:32 PM Pablo Galindo Salgado 
>> wrote:
>>
>>> > We can't piggy back on -OO as the only way to disable this, it needs
>>> to have an option of its own.  -OO is unusable as code that relies on
>>> "doc"strings as application data such as
>>> http://www.dabeaz.com/ply/ply.html exists.
>>>
>>> -OO is the only sensible way to disable the data. There are two things
>>> to disable:
>>>
>>
>> nit: I wouldn't choose the word "sensible" given that -OO is already
>> fundamentally unusable without knowing if any code in your entire
>> transitive dependencies might depend on the presence of docstrings...
>>
>>
>>>
>>> * The data in pyc files
>>> * Printing the exception highlighting
>>>
>>> Printing the exception highlighting can be disabled via combo of
>>> environment variable / -X option but collecting the data can only be
>>> disabled by -OO. The reason is that this will end in pyc files
>>> so when the data is not there, a different kind of pyc files need to be
>>> produced and I really don't want to have another set of pyc file extension
>>> just to deactivate this. Notice that also a configure
>>> time variable won't work because it will cause crashes when reading pyc
>>> files produced by the interpreter compiled without the flag.
>>>
>>
>> I don't think the optional existence of column number information needs a
>> different kind of pyc file.  Just a flag in a pyc file's header at most.
>> It isn't a new type of file.
>>
>>
>>> On Sat, 8 May 2021 at 21:13, Gregory P. Smith  wrote:
>>>


 On Sat, May 8, 2021 at 11:58 AM Pablo Galindo Salgado <
 pablog...@gmail.com> wrote:

> Hi Brett,
>
> Just to be clear, .pyo files have not existed for a while:
>> https://www.python.org/dev/peps/pep-0488/.
>
>
> Whoops, my bad, I wanted to refer to the pyc files that are generated
> with -OO, which have the "opt-2" prefix.
>
> This only kicks in at the -OO level.
>
>
> I will correct the PEP so it reflex this more exactly.
>
> I personally prefer the idea of dropping the data with -OO since if
>> you're stripping out docstrings you're already hurting introspection
>> capabilities in the name of memory. Or one could go as far as to 
>> introduce
>> -Os to do -OO plus dropping this extra data.
>
>
> This is indeed the plan, sorry for the confusion. The opt-out
> mechanism is using -OO, precisely as we are already dropping other data.
>

 We can't piggy back on -OO as the only way to disable this, it needs to
 have an option of its own.  -OO is unusable as code that relies on
 "doc"strings as application data such as
 http://www.dabeaz.com/ply/ply.html exists.

 -gps


>
> Thanks for the clarifications!
>
>
>
> On Sat, 8 May 2021 at 19:41, Brett Cannon  wrote:
>
>>
>>
>> On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado <
>> pablog...@gmail.com> wrote:
>>
>>> Although we were originally not sympathetic with it, we may need to
>>> offer an opt-out mechanism for those users that care about the impact of
>>> the overhead of the new data in pyc files
>>> and in in-memory code objectsas was suggested by some folks (Thomas,
>>> Yury, and others). For this, we could propose that the functionality 
>>> will
>>> be deactivated along with the extra
>>> information when Python is executed in optimized mode (``python
>>> -O``) and therefore pyo files will not have the overhead associated with
>>> the extra required data.
>>>
>>
>> Just to be clear, .pyo files have not existed for a while:
>> https://www.python.org/dev/peps/pep-0488/.
>>
>>
>>> Notice that Python
>>> already strips 

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Pablo Galindo Salgado
> I don't think the optional existence of column number information needs a
different kind of pyc file.  Just a flag in a pyc file's header at most.
It isn't a new type of file.

That could work, but in my personal opinion, I would prefer not to do that
as it complicates things and I think is overkill.

On Sat, 8 May 2021 at 21:45, Gregory P. Smith  wrote:

>
> On Sat, May 8, 2021 at 1:32 PM Pablo Galindo Salgado 
> wrote:
>
>> > We can't piggy back on -OO as the only way to disable this, it needs
>> to have an option of its own.  -OO is unusable as code that relies on
>> "doc"strings as application data such as
>> http://www.dabeaz.com/ply/ply.html exists.
>>
>> -OO is the only sensible way to disable the data. There are two things to
>> disable:
>>
>
> nit: I wouldn't choose the word "sensible" given that -OO is already
> fundamentally unusable without knowing if any code in your entire
> transitive dependencies might depend on the presence of docstrings...
>
>
>>
>> * The data in pyc files
>> * Printing the exception highlighting
>>
>> Printing the exception highlighting can be disabled via combo of
>> environment variable / -X option but collecting the data can only be
>> disabled by -OO. The reason is that this will end in pyc files
>> so when the data is not there, a different kind of pyc files need to be
>> produced and I really don't want to have another set of pyc file extension
>> just to deactivate this. Notice that also a configure
>> time variable won't work because it will cause crashes when reading pyc
>> files produced by the interpreter compiled without the flag.
>>
>
> I don't think the optional existence of column number information needs a
> different kind of pyc file.  Just a flag in a pyc file's header at most.
> It isn't a new type of file.
>
>
>> On Sat, 8 May 2021 at 21:13, Gregory P. Smith  wrote:
>>
>>>
>>>
>>> On Sat, May 8, 2021 at 11:58 AM Pablo Galindo Salgado <
>>> pablog...@gmail.com> wrote:
>>>
 Hi Brett,

 Just to be clear, .pyo files have not existed for a while:
> https://www.python.org/dev/peps/pep-0488/.


 Whoops, my bad, I wanted to refer to the pyc files that are generated
 with -OO, which have the "opt-2" prefix.

 This only kicks in at the -OO level.


 I will correct the PEP so it reflex this more exactly.

 I personally prefer the idea of dropping the data with -OO since if
> you're stripping out docstrings you're already hurting introspection
> capabilities in the name of memory. Or one could go as far as to introduce
> -Os to do -OO plus dropping this extra data.


 This is indeed the plan, sorry for the confusion. The opt-out mechanism
 is using -OO, precisely as we are already dropping other data.

>>>
>>> We can't piggy back on -OO as the only way to disable this, it needs to
>>> have an option of its own.  -OO is unusable as code that relies on
>>> "doc"strings as application data such as
>>> http://www.dabeaz.com/ply/ply.html exists.
>>>
>>> -gps
>>>
>>>

 Thanks for the clarifications!



 On Sat, 8 May 2021 at 19:41, Brett Cannon  wrote:

>
>
> On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado <
> pablog...@gmail.com> wrote:
>
>> Although we were originally not sympathetic with it, we may need to
>> offer an opt-out mechanism for those users that care about the impact of
>> the overhead of the new data in pyc files
>> and in in-memory code objectsas was suggested by some folks (Thomas,
>> Yury, and others). For this, we could propose that the functionality will
>> be deactivated along with the extra
>> information when Python is executed in optimized mode (``python -O``)
>> and therefore pyo files will not have the overhead associated with the
>> extra required data.
>>
>
> Just to be clear, .pyo files have not existed for a while:
> https://www.python.org/dev/peps/pep-0488/.
>
>
>> Notice that Python
>> already strips docstrings in this mode so it would be "aligned" with
>> the current mechanism of optimized mode.
>>
>
> This only kicks in at the -OO level.
>
>
>>
>> Although this complicates the implementation, it certainly is still
>> much easier than dealing with compression (and more useful for those that
>> don't want the feature). Notice that we also
>> expect pessimistic results from compression as offsets would be quite
>> random (although predominantly in the range 10 - 120).
>>
>
> I personally prefer the idea of dropping the data with -OO since if
> you're stripping out docstrings you're already hurting introspection
> capabilities in the name of memory. Or one could go as far as to introduce
> -Os to do -OO plus dropping this extra data.
>
> As for .pyc file size, I personally wouldn't worry about it. If
> someone is that space-constrained they 

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Gregory P. Smith
On Sat, May 8, 2021 at 1:32 PM Pablo Galindo Salgado 
wrote:

> > We can't piggy back on -OO as the only way to disable this, it needs to
> have an option of its own.  -OO is unusable as code that relies on
> "doc"strings as application data such as
> http://www.dabeaz.com/ply/ply.html exists.
>
> -OO is the only sensible way to disable the data. There are two things to
> disable:
>

nit: I wouldn't choose the word "sensible" given that -OO is already
fundamentally unusable without knowing if any code in your entire
transitive dependencies might depend on the presence of docstrings...


>
> * The data in pyc files
> * Printing the exception highlighting
>
> Printing the exception highlighting can be disabled via combo of
> environment variable / -X option but collecting the data can only be
> disabled by -OO. The reason is that this will end in pyc files
> so when the data is not there, a different kind of pyc files need to be
> produced and I really don't want to have another set of pyc file extension
> just to deactivate this. Notice that also a configure
> time variable won't work because it will cause crashes when reading pyc
> files produced by the interpreter compiled without the flag.
>

I don't think the optional existence of column number information needs a
different kind of pyc file.  Just a flag in a pyc file's header at most.
It isn't a new type of file.


> On Sat, 8 May 2021 at 21:13, Gregory P. Smith  wrote:
>
>>
>>
>> On Sat, May 8, 2021 at 11:58 AM Pablo Galindo Salgado <
>> pablog...@gmail.com> wrote:
>>
>>> Hi Brett,
>>>
>>> Just to be clear, .pyo files have not existed for a while:
 https://www.python.org/dev/peps/pep-0488/.
>>>
>>>
>>> Whoops, my bad, I wanted to refer to the pyc files that are generated
>>> with -OO, which have the "opt-2" prefix.
>>>
>>> This only kicks in at the -OO level.
>>>
>>>
>>> I will correct the PEP so it reflex this more exactly.
>>>
>>> I personally prefer the idea of dropping the data with -OO since if
 you're stripping out docstrings you're already hurting introspection
 capabilities in the name of memory. Or one could go as far as to introduce
 -Os to do -OO plus dropping this extra data.
>>>
>>>
>>> This is indeed the plan, sorry for the confusion. The opt-out mechanism
>>> is using -OO, precisely as we are already dropping other data.
>>>
>>
>> We can't piggy back on -OO as the only way to disable this, it needs to
>> have an option of its own.  -OO is unusable as code that relies on
>> "doc"strings as application data such as
>> http://www.dabeaz.com/ply/ply.html exists.
>>
>> -gps
>>
>>
>>>
>>> Thanks for the clarifications!
>>>
>>>
>>>
>>> On Sat, 8 May 2021 at 19:41, Brett Cannon  wrote:
>>>


 On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado <
 pablog...@gmail.com> wrote:

> Although we were originally not sympathetic with it, we may need to
> offer an opt-out mechanism for those users that care about the impact of
> the overhead of the new data in pyc files
> and in in-memory code objectsas was suggested by some folks (Thomas,
> Yury, and others). For this, we could propose that the functionality will
> be deactivated along with the extra
> information when Python is executed in optimized mode (``python -O``)
> and therefore pyo files will not have the overhead associated with the
> extra required data.
>

 Just to be clear, .pyo files have not existed for a while:
 https://www.python.org/dev/peps/pep-0488/.


> Notice that Python
> already strips docstrings in this mode so it would be "aligned" with
> the current mechanism of optimized mode.
>

 This only kicks in at the -OO level.


>
> Although this complicates the implementation, it certainly is still
> much easier than dealing with compression (and more useful for those that
> don't want the feature). Notice that we also
> expect pessimistic results from compression as offsets would be quite
> random (although predominantly in the range 10 - 120).
>

 I personally prefer the idea of dropping the data with -OO since if
 you're stripping out docstrings you're already hurting introspection
 capabilities in the name of memory. Or one could go as far as to introduce
 -Os to do -OO plus dropping this extra data.

 As for .pyc file size, I personally wouldn't worry about it. If someone
 is that space-constrained they either aren't using .pyc files or are only
 shipping a single set of .pyc files under -OO and skipping source code. And
 .pyc files are an implementation detail of CPython so there  shouldn't be
 too much of a concern for other interpreters.

 -Brett


>
> On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado <
> pablog...@gmail.com> wrote:
>
>> One last note for clarity: that's the increase of size in the stdlib,
>> the increase of size
>> 

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Pablo Galindo Salgado
> We can't piggy back on -OO as the only way to disable this, it needs to
have an option of its own.  -OO is unusable as code that relies on
"doc"strings as application data such as http://www.dabeaz.com/ply/ply.html
exists.

-OO is the only sensible way to disable the data. There are two things to
disable:

* The data in pyc files
* Printing the exception highlighting

Printing the exception highlighting can be disabled via combo of
environment variable / -X option but collecting the data can only be
disabled by -OO. The reason is that this will end in pyc files
so when the data is not there, a different kind of pyc files need to be
produced and I really don't want to have another set of pyc file extension
just to deactivate this. Notice that also a configure
time variable won't work because it will cause crashes when reading pyc
files produced by the interpreter compiled without the flag.

On Sat, 8 May 2021 at 21:13, Gregory P. Smith  wrote:

>
>
> On Sat, May 8, 2021 at 11:58 AM Pablo Galindo Salgado 
> wrote:
>
>> Hi Brett,
>>
>> Just to be clear, .pyo files have not existed for a while:
>>> https://www.python.org/dev/peps/pep-0488/.
>>
>>
>> Whoops, my bad, I wanted to refer to the pyc files that are generated
>> with -OO, which have the "opt-2" prefix.
>>
>> This only kicks in at the -OO level.
>>
>>
>> I will correct the PEP so it reflex this more exactly.
>>
>> I personally prefer the idea of dropping the data with -OO since if
>>> you're stripping out docstrings you're already hurting introspection
>>> capabilities in the name of memory. Or one could go as far as to introduce
>>> -Os to do -OO plus dropping this extra data.
>>
>>
>> This is indeed the plan, sorry for the confusion. The opt-out mechanism
>> is using -OO, precisely as we are already dropping other data.
>>
>
> We can't piggy back on -OO as the only way to disable this, it needs to
> have an option of its own.  -OO is unusable as code that relies on
> "doc"strings as application data such as
> http://www.dabeaz.com/ply/ply.html exists.
>
> -gps
>
>
>>
>> Thanks for the clarifications!
>>
>>
>>
>> On Sat, 8 May 2021 at 19:41, Brett Cannon  wrote:
>>
>>>
>>>
>>> On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado <
>>> pablog...@gmail.com> wrote:
>>>
 Although we were originally not sympathetic with it, we may need to
 offer an opt-out mechanism for those users that care about the impact of
 the overhead of the new data in pyc files
 and in in-memory code objectsas was suggested by some folks (Thomas,
 Yury, and others). For this, we could propose that the functionality will
 be deactivated along with the extra
 information when Python is executed in optimized mode (``python -O``)
 and therefore pyo files will not have the overhead associated with the
 extra required data.

>>>
>>> Just to be clear, .pyo files have not existed for a while:
>>> https://www.python.org/dev/peps/pep-0488/.
>>>
>>>
 Notice that Python
 already strips docstrings in this mode so it would be "aligned" with
 the current mechanism of optimized mode.

>>>
>>> This only kicks in at the -OO level.
>>>
>>>

 Although this complicates the implementation, it certainly is still
 much easier than dealing with compression (and more useful for those that
 don't want the feature). Notice that we also
 expect pessimistic results from compression as offsets would be quite
 random (although predominantly in the range 10 - 120).

>>>
>>> I personally prefer the idea of dropping the data with -OO since if
>>> you're stripping out docstrings you're already hurting introspection
>>> capabilities in the name of memory. Or one could go as far as to introduce
>>> -Os to do -OO plus dropping this extra data.
>>>
>>> As for .pyc file size, I personally wouldn't worry about it. If someone
>>> is that space-constrained they either aren't using .pyc files or are only
>>> shipping a single set of .pyc files under -OO and skipping source code. And
>>> .pyc files are an implementation detail of CPython so there  shouldn't be
>>> too much of a concern for other interpreters.
>>>
>>> -Brett
>>>
>>>

 On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado 
 wrote:

> One last note for clarity: that's the increase of size in the stdlib,
> the increase of size
> for pyc files goes from 28.471296MB to 34.750464MB, which is an
> increase of 22%.
>
> On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado <
> pablog...@gmail.com> wrote:
>
>> Some update on the numbers. We have made some draft implementation to
>> corroborate the
>> numbers with some more realistic tests and seems that our original
>> calculations were wrong.
>> The actual increase in size is quite bigger than previously
>> advertised:
>>
>> Using bytes object to encode the final object and marshalling that to
>> disk (so using uint8_t) as the underlying
>> type:
>>

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Gregory P. Smith
On Sat, May 8, 2021 at 11:58 AM Pablo Galindo Salgado 
wrote:

> Hi Brett,
>
> Just to be clear, .pyo files have not existed for a while:
>> https://www.python.org/dev/peps/pep-0488/.
>
>
> Whoops, my bad, I wanted to refer to the pyc files that are generated
> with -OO, which have the "opt-2" prefix.
>
> This only kicks in at the -OO level.
>
>
> I will correct the PEP so it reflex this more exactly.
>
> I personally prefer the idea of dropping the data with -OO since if you're
>> stripping out docstrings you're already hurting introspection capabilities
>> in the name of memory. Or one could go as far as to introduce -Os to do -OO
>> plus dropping this extra data.
>
>
> This is indeed the plan, sorry for the confusion. The opt-out mechanism is
> using -OO, precisely as we are already dropping other data.
>

We can't piggy back on -OO as the only way to disable this, it needs to
have an option of its own.  -OO is unusable as code that relies on
"doc"strings as application data such as http://www.dabeaz.com/ply/ply.html
exists.

-gps


>
> Thanks for the clarifications!
>
>
>
> On Sat, 8 May 2021 at 19:41, Brett Cannon  wrote:
>
>>
>>
>> On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado 
>> wrote:
>>
>>> Although we were originally not sympathetic with it, we may need to
>>> offer an opt-out mechanism for those users that care about the impact of
>>> the overhead of the new data in pyc files
>>> and in in-memory code objectsas was suggested by some folks (Thomas,
>>> Yury, and others). For this, we could propose that the functionality will
>>> be deactivated along with the extra
>>> information when Python is executed in optimized mode (``python -O``)
>>> and therefore pyo files will not have the overhead associated with the
>>> extra required data.
>>>
>>
>> Just to be clear, .pyo files have not existed for a while:
>> https://www.python.org/dev/peps/pep-0488/.
>>
>>
>>> Notice that Python
>>> already strips docstrings in this mode so it would be "aligned" with
>>> the current mechanism of optimized mode.
>>>
>>
>> This only kicks in at the -OO level.
>>
>>
>>>
>>> Although this complicates the implementation, it certainly is still much
>>> easier than dealing with compression (and more useful for those that don't
>>> want the feature). Notice that we also
>>> expect pessimistic results from compression as offsets would be quite
>>> random (although predominantly in the range 10 - 120).
>>>
>>
>> I personally prefer the idea of dropping the data with -OO since if
>> you're stripping out docstrings you're already hurting introspection
>> capabilities in the name of memory. Or one could go as far as to introduce
>> -Os to do -OO plus dropping this extra data.
>>
>> As for .pyc file size, I personally wouldn't worry about it. If someone
>> is that space-constrained they either aren't using .pyc files or are only
>> shipping a single set of .pyc files under -OO and skipping source code. And
>> .pyc files are an implementation detail of CPython so there  shouldn't be
>> too much of a concern for other interpreters.
>>
>> -Brett
>>
>>
>>>
>>> On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado 
>>> wrote:
>>>
 One last note for clarity: that's the increase of size in the stdlib,
 the increase of size
 for pyc files goes from 28.471296MB to 34.750464MB, which is an
 increase of 22%.

 On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado 
 wrote:

> Some update on the numbers. We have made some draft implementation to
> corroborate the
> numbers with some more realistic tests and seems that our original
> calculations were wrong.
> The actual increase in size is quite bigger than previously advertised:
>
> Using bytes object to encode the final object and marshalling that to
> disk (so using uint8_t) as the underlying
> type:
>
> BEFORE:
>
> ❯ ./python -m compileall -r 1000 Lib > /dev/null
> ❯ du -h Lib -c --max-depth=0
> 70M Lib
> 70M total
>
> AFTER:
> ❯ ./python -m compileall -r 1000 Lib > /dev/null
> ❯ du -h Lib -c --max-depth=0
> 76M Lib
> 76M total
>
> So that's an increase of 8.56 % over the original value. This is
> storing the start offset and end offset with no compression
> whatsoever.
>
> On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado <
> pablog...@gmail.com> wrote:
>
>> Hi there,
>>
>> We are preparing a PEP and we would like to start some early
>> discussion about one of the main aspects of the PEP.
>>
>> The work we are preparing is to allow the interpreter to produce more
>> fine-grained error messages, pointing to
>> the source associated to the instructions that are failing. For
>> example:
>>
>> Traceback (most recent call last):
>>
>>   File "test.py", line 14, in 
>>
>> lel3(x)
>>
>> ^^^
>>
>>   File "test.py", line 12, in lel3
>>

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Pablo Galindo Salgado
Hi Brett,

Just to be clear, .pyo files have not existed for a while:
> https://www.python.org/dev/peps/pep-0488/.


Whoops, my bad, I wanted to refer to the pyc files that are generated
with -OO, which have the "opt-2" prefix.

This only kicks in at the -OO level.


I will correct the PEP so it reflex this more exactly.

I personally prefer the idea of dropping the data with -OO since if you're
> stripping out docstrings you're already hurting introspection capabilities
> in the name of memory. Or one could go as far as to introduce -Os to do -OO
> plus dropping this extra data.


This is indeed the plan, sorry for the confusion. The opt-out mechanism is
using -OO, precisely as we are already dropping other data.

Thanks for the clarifications!



On Sat, 8 May 2021 at 19:41, Brett Cannon  wrote:

>
>
> On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado 
> wrote:
>
>> Although we were originally not sympathetic with it, we may need to offer
>> an opt-out mechanism for those users that care about the impact of the
>> overhead of the new data in pyc files
>> and in in-memory code objectsas was suggested by some folks (Thomas,
>> Yury, and others). For this, we could propose that the functionality will
>> be deactivated along with the extra
>> information when Python is executed in optimized mode (``python -O``) and
>> therefore pyo files will not have the overhead associated with the extra
>> required data.
>>
>
> Just to be clear, .pyo files have not existed for a while:
> https://www.python.org/dev/peps/pep-0488/.
>
>
>> Notice that Python
>> already strips docstrings in this mode so it would be "aligned" with the
>> current mechanism of optimized mode.
>>
>
> This only kicks in at the -OO level.
>
>
>>
>> Although this complicates the implementation, it certainly is still much
>> easier than dealing with compression (and more useful for those that don't
>> want the feature). Notice that we also
>> expect pessimistic results from compression as offsets would be quite
>> random (although predominantly in the range 10 - 120).
>>
>
> I personally prefer the idea of dropping the data with -OO since if you're
> stripping out docstrings you're already hurting introspection capabilities
> in the name of memory. Or one could go as far as to introduce -Os to do -OO
> plus dropping this extra data.
>
> As for .pyc file size, I personally wouldn't worry about it. If someone is
> that space-constrained they either aren't using .pyc files or are only
> shipping a single set of .pyc files under -OO and skipping source code. And
> .pyc files are an implementation detail of CPython so there  shouldn't be
> too much of a concern for other interpreters.
>
> -Brett
>
>
>>
>> On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado 
>> wrote:
>>
>>> One last note for clarity: that's the increase of size in the stdlib,
>>> the increase of size
>>> for pyc files goes from 28.471296MB to 34.750464MB, which is an increase
>>> of 22%.
>>>
>>> On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado 
>>> wrote:
>>>
 Some update on the numbers. We have made some draft implementation to
 corroborate the
 numbers with some more realistic tests and seems that our original
 calculations were wrong.
 The actual increase in size is quite bigger than previously advertised:

 Using bytes object to encode the final object and marshalling that to
 disk (so using uint8_t) as the underlying
 type:

 BEFORE:

 ❯ ./python -m compileall -r 1000 Lib > /dev/null
 ❯ du -h Lib -c --max-depth=0
 70M Lib
 70M total

 AFTER:
 ❯ ./python -m compileall -r 1000 Lib > /dev/null
 ❯ du -h Lib -c --max-depth=0
 76M Lib
 76M total

 So that's an increase of 8.56 % over the original value. This is
 storing the start offset and end offset with no compression
 whatsoever.

 On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado 
 wrote:

> Hi there,
>
> We are preparing a PEP and we would like to start some early
> discussion about one of the main aspects of the PEP.
>
> The work we are preparing is to allow the interpreter to produce more
> fine-grained error messages, pointing to
> the source associated to the instructions that are failing. For
> example:
>
> Traceback (most recent call last):
>
>   File "test.py", line 14, in 
>
> lel3(x)
>
> ^^^
>
>   File "test.py", line 12, in lel3
>
> return lel2(x) / 23
>
>^^^
>
>   File "test.py", line 9, in lel2
>
> return 25 + lel(x) + lel(x)
>
> ^^
>
>   File "test.py", line 6, in lel
>
> return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>
>  ^
>
> TypeError: 'NoneType' object is not subscriptable
>
> The cost of this is having the start 

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Brett Cannon
On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado 
wrote:

> Although we were originally not sympathetic with it, we may need to offer
> an opt-out mechanism for those users that care about the impact of the
> overhead of the new data in pyc files
> and in in-memory code objectsas was suggested by some folks (Thomas, Yury,
> and others). For this, we could propose that the functionality will be
> deactivated along with the extra
> information when Python is executed in optimized mode (``python -O``) and
> therefore pyo files will not have the overhead associated with the extra
> required data.
>

Just to be clear, .pyo files have not existed for a while:
https://www.python.org/dev/peps/pep-0488/.


> Notice that Python
> already strips docstrings in this mode so it would be "aligned" with the
> current mechanism of optimized mode.
>

This only kicks in at the -OO level.


>
> Although this complicates the implementation, it certainly is still much
> easier than dealing with compression (and more useful for those that don't
> want the feature). Notice that we also
> expect pessimistic results from compression as offsets would be quite
> random (although predominantly in the range 10 - 120).
>

I personally prefer the idea of dropping the data with -OO since if you're
stripping out docstrings you're already hurting introspection capabilities
in the name of memory. Or one could go as far as to introduce -Os to do -OO
plus dropping this extra data.

As for .pyc file size, I personally wouldn't worry about it. If someone is
that space-constrained they either aren't using .pyc files or are only
shipping a single set of .pyc files under -OO and skipping source code. And
.pyc files are an implementation detail of CPython so there  shouldn't be
too much of a concern for other interpreters.

-Brett


>
> On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado 
> wrote:
>
>> One last note for clarity: that's the increase of size in the stdlib, the
>> increase of size
>> for pyc files goes from 28.471296MB to 34.750464MB, which is an increase
>> of 22%.
>>
>> On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado 
>> wrote:
>>
>>> Some update on the numbers. We have made some draft implementation to
>>> corroborate the
>>> numbers with some more realistic tests and seems that our original
>>> calculations were wrong.
>>> The actual increase in size is quite bigger than previously advertised:
>>>
>>> Using bytes object to encode the final object and marshalling that to
>>> disk (so using uint8_t) as the underlying
>>> type:
>>>
>>> BEFORE:
>>>
>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null
>>> ❯ du -h Lib -c --max-depth=0
>>> 70M Lib
>>> 70M total
>>>
>>> AFTER:
>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null
>>> ❯ du -h Lib -c --max-depth=0
>>> 76M Lib
>>> 76M total
>>>
>>> So that's an increase of 8.56 % over the original value. This is storing
>>> the start offset and end offset with no compression
>>> whatsoever.
>>>
>>> On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado 
>>> wrote:
>>>
 Hi there,

 We are preparing a PEP and we would like to start some early discussion
 about one of the main aspects of the PEP.

 The work we are preparing is to allow the interpreter to produce more
 fine-grained error messages, pointing to
 the source associated to the instructions that are failing. For example:

 Traceback (most recent call last):

   File "test.py", line 14, in 

 lel3(x)

 ^^^

   File "test.py", line 12, in lel3

 return lel2(x) / 23

^^^

   File "test.py", line 9, in lel2

 return 25 + lel(x) + lel(x)

 ^^

   File "test.py", line 6, in lel

 return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)

  ^

 TypeError: 'NoneType' object is not subscriptable

 The cost of this is having the start column number and end
 column number information for every bytecode instruction
 and this is what we want to discuss (there is also some stack cost to
 re-raise exceptions but that's not a big problem in
 any case). Given that column numbers are not very big compared with
 line numbers, we plan to store these as unsigned chars
 or unsigned shorts. We ran some experiments over the standard library
 and we found that the overhead of all pyc files is:

 * If we use shorts, the total overhead is ~3% (total size 28MB and the
 extra size is 0.88 MB).
 * If we use chars. the total overhead is ~1.5% (total size 28 MB and
 the extra size is 0.44MB).

 One of the disadvantages of using chars is that we can only report
 columns from 1 to 255 so if an error happens in a column
 bigger than that then we would have to exclude it (and not show the
 highlighting) for that frame. Unsigned short will 

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Jelle Zijlstra
El sáb, 8 may 2021 a las 10:00, Devin Jeanpierre ()
escribió:

> > What are people thoughts on the feature?
>
> I'm +1, this level of detail in the bytecode is very useful. My main
> interest is actually from the AST though. :) In order to be in the
> bytecode, one assumes it must first be in the AST. That information is
> incredibly useful for refactoring tools like https://github.com/ssbr/refex
> (n.b. author=me) or https://github.com/gristlabs/asttokens (which refex
> builds on). Currently, asttokens actually attempts to re-discover that kind
> of information after the fact, which is error-prone and difficult.
>
The AST already has column offsets (
https://docs.python.org/3.10/library/ast.html#ast.AST.col_offset).


>
> This could also be useful for finer-grained code coverage tracking and/or
> debugging. One can actually imagine highlighting the spans of code which
> were only partially executed: e.g. if only x() were ever executed in "x()
> and y()" . Ned Batchelder once did wild hacks in this space, and maybe this
> proposal could lead in the future to something non-hacky?
> https://nedbatchelder.com/blog/200804/wicked_hack_python_bytecode_tracing.html
> I say "in the future" because it doesn't just automatically work, since as
> I understand it, coverage currently doesn't track spans, but lines hit by
> the line-based debugger. Something else is needed to be able to track which
> spans were hit rather than which lines, and it may be similarly hacky if
> it's isolated to coveragepy. If, for example, enough were exposed to let a
> debugger skip to bytecode for the next different (sub) span, then this
> would be useful for both coverage and actual debugging as you step through
> an expression. This is probably way out of scope for your PEP, but even so,
> the feature may be laying some useful ground work here.
>
> -- Devin
>
> On Fri, May 7, 2021 at 2:52 PM Pablo Galindo Salgado 
> wrote:
>
>> Hi there,
>>
>> We are preparing a PEP and we would like to start some early discussion
>> about one of the main aspects of the PEP.
>>
>> The work we are preparing is to allow the interpreter to produce more
>> fine-grained error messages, pointing to
>> the source associated to the instructions that are failing. For example:
>>
>> Traceback (most recent call last):
>>
>>   File "test.py", line 14, in 
>>
>> lel3(x)
>>
>> ^^^
>>
>>   File "test.py", line 12, in lel3
>>
>> return lel2(x) / 23
>>
>>^^^
>>
>>   File "test.py", line 9, in lel2
>>
>> return 25 + lel(x) + lel(x)
>>
>> ^^
>>
>>   File "test.py", line 6, in lel
>>
>> return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>>
>>  ^
>>
>> TypeError: 'NoneType' object is not subscriptable
>>
>> The cost of this is having the start column number and end column number
>> information for every bytecode instruction
>> and this is what we want to discuss (there is also some stack cost to
>> re-raise exceptions but that's not a big problem in
>> any case). Given that column numbers are not very big compared with line
>> numbers, we plan to store these as unsigned chars
>> or unsigned shorts. We ran some experiments over the standard library and
>> we found that the overhead of all pyc files is:
>>
>> * If we use shorts, the total overhead is ~3% (total size 28MB and the
>> extra size is 0.88 MB).
>> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
>> extra size is 0.44MB).
>>
>> One of the disadvantages of using chars is that we can only report
>> columns from 1 to 255 so if an error happens in a column
>> bigger than that then we would have to exclude it (and not show the
>> highlighting) for that frame. Unsigned short will allow
>> the values to go from 0 to 65535.
>>
>> Unfortunately these numbers are not easily compressible, as every
>> instruction would have very different offsets.
>>
>> There is also the possibility of not doing this based on some build flag
>> on when using -O to allow users to opt out, but given the fact
>> that these numbers can be quite useful to other tools like coverage
>> measuring tools, tracers, profilers and the such adding conditional
>> logic to many places would complicate the implementation considerably and
>> will potentially reduce the usability of those tools so we prefer
>> not to have the conditional logic. We believe this is extra cost is very
>> much worth the better error reporting but we understand and respect
>> other points of view.
>>
>> Does anyone see a better way to encode this information **without
>> complicating a lot the implementation**? What are people thoughts on the
>> feature?
>>
>> Thanks in advance,
>>
>> Regards from cloudy London,
>> Pablo Galindo Salgado
>>
>> ___
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> 

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Devin Jeanpierre
> What are people thoughts on the feature?

I'm +1, this level of detail in the bytecode is very useful. My main
interest is actually from the AST though. :) In order to be in the
bytecode, one assumes it must first be in the AST. That information is
incredibly useful for refactoring tools like https://github.com/ssbr/refex
(n.b. author=me) or https://github.com/gristlabs/asttokens (which refex
builds on). Currently, asttokens actually attempts to re-discover that kind
of information after the fact, which is error-prone and difficult.

This could also be useful for finer-grained code coverage tracking and/or
debugging. One can actually imagine highlighting the spans of code which
were only partially executed: e.g. if only x() were ever executed in "x()
and y()" . Ned Batchelder once did wild hacks in this space, and maybe this
proposal could lead in the future to something non-hacky?
https://nedbatchelder.com/blog/200804/wicked_hack_python_bytecode_tracing.html
I say "in the future" because it doesn't just automatically work, since as
I understand it, coverage currently doesn't track spans, but lines hit by
the line-based debugger. Something else is needed to be able to track which
spans were hit rather than which lines, and it may be similarly hacky if
it's isolated to coveragepy. If, for example, enough were exposed to let a
debugger skip to bytecode for the next different (sub) span, then this
would be useful for both coverage and actual debugging as you step through
an expression. This is probably way out of scope for your PEP, but even so,
the feature may be laying some useful ground work here.

-- Devin

On Fri, May 7, 2021 at 2:52 PM Pablo Galindo Salgado 
wrote:

> Hi there,
>
> We are preparing a PEP and we would like to start some early discussion
> about one of the main aspects of the PEP.
>
> The work we are preparing is to allow the interpreter to produce more
> fine-grained error messages, pointing to
> the source associated to the instructions that are failing. For example:
>
> Traceback (most recent call last):
>
>   File "test.py", line 14, in 
>
> lel3(x)
>
> ^^^
>
>   File "test.py", line 12, in lel3
>
> return lel2(x) / 23
>
>^^^
>
>   File "test.py", line 9, in lel2
>
> return 25 + lel(x) + lel(x)
>
> ^^
>
>   File "test.py", line 6, in lel
>
> return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>
>  ^
>
> TypeError: 'NoneType' object is not subscriptable
>
> The cost of this is having the start column number and end column number
> information for every bytecode instruction
> and this is what we want to discuss (there is also some stack cost to
> re-raise exceptions but that's not a big problem in
> any case). Given that column numbers are not very big compared with line
> numbers, we plan to store these as unsigned chars
> or unsigned shorts. We ran some experiments over the standard library and
> we found that the overhead of all pyc files is:
>
> * If we use shorts, the total overhead is ~3% (total size 28MB and the
> extra size is 0.88 MB).
> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
> extra size is 0.44MB).
>
> One of the disadvantages of using chars is that we can only report columns
> from 1 to 255 so if an error happens in a column
> bigger than that then we would have to exclude it (and not show the
> highlighting) for that frame. Unsigned short will allow
> the values to go from 0 to 65535.
>
> Unfortunately these numbers are not easily compressible, as every
> instruction would have very different offsets.
>
> There is also the possibility of not doing this based on some build flag
> on when using -O to allow users to opt out, but given the fact
> that these numbers can be quite useful to other tools like coverage
> measuring tools, tracers, profilers and the such adding conditional
> logic to many places would complicate the implementation considerably and
> will potentially reduce the usability of those tools so we prefer
> not to have the conditional logic. We believe this is extra cost is very
> much worth the better error reporting but we understand and respect
> other points of view.
>
> Does anyone see a better way to encode this information **without
> complicating a lot the implementation**? What are people thoughts on the
> feature?
>
> Thanks in advance,
>
> Regards from cloudy London,
> Pablo Galindo Salgado
>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/DB3RTYBF2BXTY6ZHP3Z4DXCRWPJIQUFD/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe 

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Ammar Askar
I really like this idea Nathaniel.

We already have a section considering lazy-loading column information in the
draft PEP but obviously it has a pretty high implementation complexity since it
necessitates a change in the pyc format and the importlib machinery.

Long term this might be the way to go for column information and line
information to alleviate the memory burden.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XS435QQOBWWQNU2FY6RVLA4YUXJCN7XF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Antoine Pitrou


You can certainly get fancy and apply delta encoding + entropy
compression, such as done in Parquet, a high-performance data storage
format:
https://github.com/apache/parquet-format/blob/master/Encodings.md#delta-encoding-delta_binary_packed--5

(the linked paper from Lemire and Boytsov gives a lot of ideas)

But it would be weird to apply such level of engineering when we never
bothered compressing docstrings.

Regards

Antoine.



On Fri, 7 May 2021 23:30:46 +0100
Pablo Galindo Salgado  wrote:
> This is actually a very good point. The only disadvantage is that it
> complicates the parsing a bit and we loose the possibility of indexing
> the table by instruction offset.
> 
> On Fri, 7 May 2021 at 23:01, Larry Hastings  wrote:
> 
> > On 5/7/21 2:45 PM, Pablo Galindo Salgado wrote:
> >
> > Given that column numbers are not very big compared with line numbers, we
> > plan to store these as unsigned chars
> > or unsigned shorts. We ran some experiments over the standard library and
> > we found that the overhead of all pyc files is:
> >
> > * If we use shorts, the total overhead is ~3% (total size 28MB and the
> > extra size is 0.88 MB).
> > * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
> > extra size is 0.44MB).
> >
> > One of the disadvantages of using chars is that we can only report columns
> > from 1 to 255 so if an error happens in a column
> > bigger than that then we would have to exclude it (and not show the
> > highlighting) for that frame. Unsigned short will allow
> > the values to go from 0 to 65535.
> >
> > Are lnotab entries required to be a fixed size?  If not:
> >
> > if column < 255:
> > lnotab.write_one_byte(column)
> > else:
> > lnotab.write_one_byte(255)
> > lnotab.write_two_bytes(column)
> >
> >
> > I might even write four bytes instead of two in the latter case,
> >
> >
> > */arry*
> > ___
> > Python-Dev mailing list -- python-dev@python.org
> > To unsubscribe send an email to python-dev-le...@python.org
> > https://mail.python.org/mailman3/lists/python-dev.python.org/
> > Message archived at
> > https://mail.python.org/archives/list/python-dev@python.org/message/B3SFCZPXIKGO3LM6UJVSJXFIRAZH2R26/
> > Code of Conduct: http://python.org/psf/codeofconduct/
> >  
> 



___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UOCHN5ZY3ERPNWOCO2SJRTCDTEYMYVD7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Antoine Pitrou
On Fri, 7 May 2021 23:20:55 +0100
Irit Katriel via Python-Dev  wrote:
> On Fri, May 7, 2021 at 10:52 PM Pablo Galindo Salgado 
> wrote:
> 
> >
> > The cost of this is having the start column number and end column number
> > information for every bytecode instruction
> >  
> 
> 
> Is it really every instruction? Or only those that can raise exceptions?

I think almost any instruction can be interrupted with
KeyboardInterrupt (or any other asynchronously-raised exception).

Regards

Antoine.


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HMX327TD72DOTCAE2TGJRKFHF4H4ZWEC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Antoine Pitrou
On Sat, 8 May 2021 02:58:40 +
Neil Schemenauer  wrote:
> On 2021-05-07, Pablo Galindo Salgado wrote:
> > Technically the main concern may be the size of the unmarshalled
> > pyc files in memory, more than the storage size of disk.  
> 
> It would be cool if we could mmap the pyc files and have the VM run
> code without an unmarshal step.

What happens if another process mutates or truncates the file while the
CPython VM is executing code from the mapped file?  Crash?

> Instead, could we dump out the pyc data in a format similar to Cap'n
> Proto?  That way no unmarshal is needed.

How do you freeze PyObjects in Cap'n Proto so that no unmarshal is
needed when loading them?

> The benefit would be faster startup times.  The unmarshal step is
> costly.

How costly? Do we have numbers?

> It would mostly solve the concern about these larger
> linenum/colnum tables.  We would only load that data into memory if
> the table is accessed.

Memory-mapped files are accessed with page granularity (4 kB on x86), so
I'm not sure it's that simple.  You would have to make sure to store
those tables in separate sections distant from the actual code areas.

Regards

Antoine.


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/R57VMAURJIA3DZKMRTBK35CTMDS5JCDZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Nathaniel Smith
On Fri, May 7, 2021 at 8:14 PM Neil Schemenauer  wrote:
>
> On 2021-05-07, Pablo Galindo Salgado wrote:
> > Technically the main concern may be the size of the unmarshalled
> > pyc files in memory, more than the storage size of disk.
>
> It would be cool if we could mmap the pyc files and have the VM run
> code without an unmarshal step.  One idea is something similar to
> the Facebook "not another freeze" PR but with a twist.  Their
> approach was to dump out code objects so they could be loaded as if
> they were statically defined structures.
>
> Instead, could we dump out the pyc data in a format similar to Cap'n
> Proto?  That way no unmarshal is needed.  The VM would have to be
> extensively changed to run code in that format.  That's the hard
> part.
>
> The benefit would be faster startup times.  The unmarshal step is
> costly.  It would mostly solve the concern about these larger
> linenum/colnum tables.  We would only load that data into memory if
> the table is accessed.

A simpler version would be to pack just the docstrings/lnotab/column
numbers into a separate part of the .pyc, and store a reference to the
file + offset to load them lazily on demand. No need for mmap.

Could also store them in memory, but with some cheap compression
applied, and decompress on access. None of these get accessed often.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/Q2DBRE5YKLTSPVCMUCXPEDXKFCA4UUGQ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Neil Schemenauer
On 2021-05-07, Pablo Galindo Salgado wrote:
> Technically the main concern may be the size of the unmarshalled
> pyc files in memory, more than the storage size of disk.

It would be cool if we could mmap the pyc files and have the VM run
code without an unmarshal step.  One idea is something similar to
the Facebook "not another freeze" PR but with a twist.  Their
approach was to dump out code objects so they could be loaded as if
they were statically defined structures.

Instead, could we dump out the pyc data in a format similar to Cap'n
Proto?  That way no unmarshal is needed.  The VM would have to be
extensively changed to run code in that format.  That's the hard
part.

The benefit would be faster startup times.  The unmarshal step is
costly.  It would mostly solve the concern about these larger
linenum/colnum tables.  We would only load that data into memory if
the table is accessed.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UKDLCOTUFNWGSMWWGLH3DJC4AVYZANDM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Pablo Galindo Salgado
Although we were originally not sympathetic with it, we may need to offer
an opt-out mechanism for those users that care about the impact of the
overhead of the new data in pyc files
and in in-memory code objectsas was suggested by some folks (Thomas, Yury,
and others). For this, we could propose that the functionality will be
deactivated along with the extra
information when Python is executed in optimized mode (``python -O``) and
therefore pyo files will not have the overhead associated with the extra
required data. Notice that Python
already strips docstrings in this mode so it would be "aligned" with the
current mechanism of optimized mode.

Although this complicates the implementation, it certainly is still much
easier than dealing with compression (and more useful for those that don't
want the feature). Notice that we also
expect pessimistic results from compression as offsets would be quite
random (although predominantly in the range 10 - 120).

On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado 
wrote:

> One last note for clarity: that's the increase of size in the stdlib, the
> increase of size
> for pyc files goes from 28.471296MB to 34.750464MB, which is an increase
> of 22%.
>
> On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado 
> wrote:
>
>> Some update on the numbers. We have made some draft implementation to
>> corroborate the
>> numbers with some more realistic tests and seems that our original
>> calculations were wrong.
>> The actual increase in size is quite bigger than previously advertised:
>>
>> Using bytes object to encode the final object and marshalling that to
>> disk (so using uint8_t) as the underlying
>> type:
>>
>> BEFORE:
>>
>> ❯ ./python -m compileall -r 1000 Lib > /dev/null
>> ❯ du -h Lib -c --max-depth=0
>> 70M Lib
>> 70M total
>>
>> AFTER:
>> ❯ ./python -m compileall -r 1000 Lib > /dev/null
>> ❯ du -h Lib -c --max-depth=0
>> 76M Lib
>> 76M total
>>
>> So that's an increase of 8.56 % over the original value. This is storing
>> the start offset and end offset with no compression
>> whatsoever.
>>
>> On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado 
>> wrote:
>>
>>> Hi there,
>>>
>>> We are preparing a PEP and we would like to start some early discussion
>>> about one of the main aspects of the PEP.
>>>
>>> The work we are preparing is to allow the interpreter to produce more
>>> fine-grained error messages, pointing to
>>> the source associated to the instructions that are failing. For example:
>>>
>>> Traceback (most recent call last):
>>>
>>>   File "test.py", line 14, in 
>>>
>>> lel3(x)
>>>
>>> ^^^
>>>
>>>   File "test.py", line 12, in lel3
>>>
>>> return lel2(x) / 23
>>>
>>>^^^
>>>
>>>   File "test.py", line 9, in lel2
>>>
>>> return 25 + lel(x) + lel(x)
>>>
>>> ^^
>>>
>>>   File "test.py", line 6, in lel
>>>
>>> return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>>>
>>>  ^
>>>
>>> TypeError: 'NoneType' object is not subscriptable
>>>
>>> The cost of this is having the start column number and end column number
>>> information for every bytecode instruction
>>> and this is what we want to discuss (there is also some stack cost to
>>> re-raise exceptions but that's not a big problem in
>>> any case). Given that column numbers are not very big compared with line
>>> numbers, we plan to store these as unsigned chars
>>> or unsigned shorts. We ran some experiments over the standard library
>>> and we found that the overhead of all pyc files is:
>>>
>>> * If we use shorts, the total overhead is ~3% (total size 28MB and the
>>> extra size is 0.88 MB).
>>> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
>>> extra size is 0.44MB).
>>>
>>> One of the disadvantages of using chars is that we can only report
>>> columns from 1 to 255 so if an error happens in a column
>>> bigger than that then we would have to exclude it (and not show the
>>> highlighting) for that frame. Unsigned short will allow
>>> the values to go from 0 to 65535.
>>>
>>> Unfortunately these numbers are not easily compressible, as every
>>> instruction would have very different offsets.
>>>
>>> There is also the possibility of not doing this based on some build flag
>>> on when using -O to allow users to opt out, but given the fact
>>> that these numbers can be quite useful to other tools like coverage
>>> measuring tools, tracers, profilers and the such adding conditional
>>> logic to many places would complicate the implementation considerably
>>> and will potentially reduce the usability of those tools so we prefer
>>> not to have the conditional logic. We believe this is extra cost is very
>>> much worth the better error reporting but we understand and respect
>>> other points of view.
>>>
>>> Does anyone see a better way to encode this information **without
>>> complicating a lot the implementation**? What are people thoughts on the
>>> feature?
>>>
>>> 

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Chris Jerdonek
On Fri, May 7, 2021 at 6:39 PM Steven D'Aprano  wrote:

> On Fri, May 07, 2021 at 06:02:51PM -0700, Chris Jerdonek wrote:
>
> > To know what compression methods might be effective, I’m wondering if it
> > could be useful to see separate histograms of, say, the start column
> number
> > and width over the code base. Or for people that really want to dig in,
> > maybe access to the set of all pairs could help. (E.g. maybe a histogram
> of
> > pairs could also reveal something.)
>
> I think this is over-analysing. Do we need to micro-optimize the
> compression algorithm? Let's make the choice simple: live with the size
> increase, or swap to LZ4 compression as Antoine suggested. Analysis
> paralysis is a real risk here.
>
> If there are implementations which cannot support either (MicroPython?)
> they should be free to continue doing things the old way. In other
> words, "fine grained error messages" should be a quality of
> implementation feature rather than a language guarantee.
>
> I understand that the plan is to make this feature optional in any case,
> to allow third-party tools to catch up.
>
> If people really want to do that histogram analysis so that they can
> optimize the choice of compression algorithm, of course they are free to
> do so. But the PEP authors should not feel that they are obliged to do
> so, and we should avoid the temptation to bikeshed over compressors.
>

I'm not sure why you're sounding so negative. Pablo asked for ideas in his
first message to the list:

On Fri, May 7, 2021 at 2:53 PM Pablo Galindo Salgado 
wrote:

> Does anyone see a better way to encode this information **without
> complicating a lot the implementation**?
>

Maybe a large gain can be made with a simple tweak to how the pair is
encoded, but there's no way to know without seeing the distribution. Also,
my reply wasn't about the pyc files on disk but about their representation
in memory, which Pablo later said may be the main concern. So it's not
compression algorithms like LZ4 so much as a method of encoding.

--Chris


>
> (For what it's worth, I like this proposed feature, I don't care about a
> 20-25% increase in pyc file size, but if this leads to adding LZ4
> compression to the stdlib, I like it even more :-)
>
>
> --
> Steve
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/6H2XSRMARU4SX4WRMIO2M4MI4EQASPBC/
> Code of Conduct: http://python.org/psf/codeofconduct/
>

>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UYARCZJJFIEKRWMEEBW2FAGBPAPDFJGG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Pablo Galindo Salgado
> I'm wondering if it's possible to compromise with one position that's
not as complete but still gives a good hint:

Even if is possible, it will be quite less useful (a lot of users wanted to
highlight full ranges for syntax errors, and that change was very well
received
when we announce it in 3.10) and most importantly, will render the feature
much less useful for other tools such as profilers, coverage tools, and the
like.

It will also make the feature less useful for people that want to display
even more information such as error reporting tools, IDEsetc


On Sat, 8 May 2021 at 02:41, MRAB  wrote:

> On 2021-05-08 01:43, Pablo Galindo Salgado wrote:
> > Some update on the numbers. We have made some draft implementation to
> > corroborate the
> > numbers with some more realistic tests and seems that our original
> > calculations were wrong.
> > The actual increase in size is quite bigger than previously advertised:
> >
> > Using bytes object to encode the final object and marshalling that to
> > disk (so using uint8_t) as the underlying
> > type:
> >
> > BEFORE:
> >
> > ❯ ./python -m compileall -r 1000 Lib > /dev/null
> > ❯ du -h Lib -c --max-depth=0
> > 70M Lib
> > 70M total
> >
> > AFTER:
> > ❯ ./python -m compileall -r 1000 Lib > /dev/null
> > ❯ du -h Lib -c --max-depth=0
> > 76M Lib
> > 76M total
> >
> > So that's an increase of 8.56 % over the original value. This is storing
> > the start offset and end offset with no compression
> > whatsoever.
> >
> [snip]
>
> I'm wondering if it's possible to compromise with one position that's
> not as complete but still gives a good hint:
>
> For example:
>
>File "test.py", line 6, in lel
>  return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>^
>
> TypeError: 'NoneType' object is not subscriptable
>
> That at least tells you which subscript raised the exception.
>
>
> Another example:
>
>Traceback (most recent call last):
>  File "test.py", line 4, in 
>print(1 / x + 1 / y)
>^
>ZeroDivisionError: division by zero
>
> as distinct from:
>
>Traceback (most recent call last):
>  File "test.py", line 4, in 
>print(1 / x + 1 / y)
>^
>ZeroDivisionError: division by zero
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/4RGQALI6T6HBNRDUUEYX4FA2YKTZDBNA/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/7OKDSXZZ7TQFQ3X4RZGNGLX5UDF2B5QW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread MRAB

On 2021-05-08 01:43, Pablo Galindo Salgado wrote:
Some update on the numbers. We have made some draft implementation to 
corroborate the
numbers with some more realistic tests and seems that our original 
calculations were wrong.

The actual increase in size is quite bigger than previously advertised:

Using bytes object to encode the final object and marshalling that to 
disk (so using uint8_t) as the underlying

type:

BEFORE:

❯ ./python -m compileall -r 1000 Lib > /dev/null
❯ du -h Lib -c --max-depth=0
70M     Lib
70M     total

AFTER:
❯ ./python -m compileall -r 1000 Lib > /dev/null
❯ du -h Lib -c --max-depth=0
76M     Lib
76M     total

So that's an increase of 8.56 % over the original value. This is storing 
the start offset and end offset with no compression

whatsoever.


[snip]

I'm wondering if it's possible to compromise with one position that's 
not as complete but still gives a good hint:


For example:

  File "test.py", line 6, in lel
return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
  ^

TypeError: 'NoneType' object is not subscriptable

That at least tells you which subscript raised the exception.


Another example:

  Traceback (most recent call last):
File "test.py", line 4, in 
  print(1 / x + 1 / y)
  ^
  ZeroDivisionError: division by zero

as distinct from:

  Traceback (most recent call last):
File "test.py", line 4, in 
  print(1 / x + 1 / y)
  ^
  ZeroDivisionError: division by zero
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4RGQALI6T6HBNRDUUEYX4FA2YKTZDBNA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Steven D'Aprano
On Fri, May 07, 2021 at 06:02:51PM -0700, Chris Jerdonek wrote:

> To know what compression methods might be effective, I’m wondering if it
> could be useful to see separate histograms of, say, the start column number
> and width over the code base. Or for people that really want to dig in,
> maybe access to the set of all pairs could help. (E.g. maybe a histogram of
> pairs could also reveal something.)

I think this is over-analysing. Do we need to micro-optimize the 
compression algorithm? Let's make the choice simple: live with the size 
increase, or swap to LZ4 compression as Antoine suggested. Analysis 
paralysis is a real risk here.

If there are implementations which cannot support either (MicroPython?) 
they should be free to continue doing things the old way. In other 
words, "fine grained error messages" should be a quality of 
implementation feature rather than a language guarantee.

I understand that the plan is to make this feature optional in any case, 
to allow third-party tools to catch up.

If people really want to do that histogram analysis so that they can 
optimize the choice of compression algorithm, of course they are free to 
do so. But the PEP authors should not feel that they are obliged to do 
so, and we should avoid the temptation to bikeshed over compressors.

(For what it's worth, I like this proposed feature, I don't care about a 
20-25% increase in pyc file size, but if this leads to adding LZ4 
compression to the stdlib, I like it even more :-)


-- 
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/6H2XSRMARU4SX4WRMIO2M4MI4EQASPBC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Chris Jerdonek
On Fri, May 7, 2021 at 5:44 PM Pablo Galindo Salgado 
wrote:

> Some update on the numbers. We have made some draft implementation to
> corroborate the
> numbers with some more realistic tests and seems that our original
> calculations were wrong.
> The actual increase in size is quite bigger than previously advertised:
>
> Using bytes object to encode the final object and marshalling that to disk
> (so using uint8_t) as the underlying
> type:
>
> BEFORE:
>
> ❯ ./python -m compileall -r 1000 Lib > /dev/null
> ❯ du -h Lib -c --max-depth=0
> 70M Lib
> 70M total
>
> AFTER:
> ❯ ./python -m compileall -r 1000 Lib > /dev/null
> ❯ du -h Lib -c --max-depth=0
> 76M Lib
> 76M total
>
> So that's an increase of 8.56 % over the original value. This is storing
> the start offset and end offset with no compression
> whatsoever.
>

To know what compression methods might be effective, I’m wondering if it
could be useful to see separate histograms of, say, the start column number
and width over the code base. Or for people that really want to dig in,
maybe access to the set of all pairs could help. (E.g. maybe a histogram of
pairs could also reveal something.)

—Chris



> On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado 
> wrote:
>
>> Hi there,
>>
>> We are preparing a PEP and we would like to start some early discussion
>> about one of the main aspects of the PEP.
>>
>> The work we are preparing is to allow the interpreter to produce more
>> fine-grained error messages, pointing to
>> the source associated to the instructions that are failing. For example:
>>
>> Traceback (most recent call last):
>>
>>   File "test.py", line 14, in 
>>
>> lel3(x)
>>
>> ^^^
>>
>>   File "test.py", line 12, in lel3
>>
>> return lel2(x) / 23
>>
>>^^^
>>
>>   File "test.py", line 9, in lel2
>>
>> return 25 + lel(x) + lel(x)
>>
>> ^^
>>
>>   File "test.py", line 6, in lel
>>
>> return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>>
>>  ^
>>
>> TypeError: 'NoneType' object is not subscriptable
>>
>> The cost of this is having the start column number and end column number
>> information for every bytecode instruction
>> and this is what we want to discuss (there is also some stack cost to
>> re-raise exceptions but that's not a big problem in
>> any case). Given that column numbers are not very big compared with line
>> numbers, we plan to store these as unsigned chars
>> or unsigned shorts. We ran some experiments over the standard library and
>> we found that the overhead of all pyc files is:
>>
>> * If we use shorts, the total overhead is ~3% (total size 28MB and the
>> extra size is 0.88 MB).
>> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
>> extra size is 0.44MB).
>>
>> One of the disadvantages of using chars is that we can only report
>> columns from 1 to 255 so if an error happens in a column
>> bigger than that then we would have to exclude it (and not show the
>> highlighting) for that frame. Unsigned short will allow
>> the values to go from 0 to 65535.
>>
>> Unfortunately these numbers are not easily compressible, as every
>> instruction would have very different offsets.
>>
>> There is also the possibility of not doing this based on some build flag
>> on when using -O to allow users to opt out, but given the fact
>> that these numbers can be quite useful to other tools like coverage
>> measuring tools, tracers, profilers and the such adding conditional
>> logic to many places would complicate the implementation considerably and
>> will potentially reduce the usability of those tools so we prefer
>> not to have the conditional logic. We believe this is extra cost is very
>> much worth the better error reporting but we understand and respect
>> other points of view.
>>
>> Does anyone see a better way to encode this information **without
>> complicating a lot the implementation**? What are people thoughts on the
>> feature?
>>
>> Thanks in advance,
>>
>> Regards from cloudy London,
>> Pablo Galindo Salgado
>>
>> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/QDEKMTZRMPEKPFFBPCGUYWLLR43A6M6U/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZTNJHADASSERV65FSVVYWNL6JF65CYQK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Pablo Galindo Salgado
One last note for clarity: that's the increase of size in the stdlib, the
increase of size
for pyc files goes from 28.471296MB to 34.750464MB, which is an increase of
22%.

On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado 
wrote:

> Some update on the numbers. We have made some draft implementation to
> corroborate the
> numbers with some more realistic tests and seems that our original
> calculations were wrong.
> The actual increase in size is quite bigger than previously advertised:
>
> Using bytes object to encode the final object and marshalling that to disk
> (so using uint8_t) as the underlying
> type:
>
> BEFORE:
>
> ❯ ./python -m compileall -r 1000 Lib > /dev/null
> ❯ du -h Lib -c --max-depth=0
> 70M Lib
> 70M total
>
> AFTER:
> ❯ ./python -m compileall -r 1000 Lib > /dev/null
> ❯ du -h Lib -c --max-depth=0
> 76M Lib
> 76M total
>
> So that's an increase of 8.56 % over the original value. This is storing
> the start offset and end offset with no compression
> whatsoever.
>
> On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado 
> wrote:
>
>> Hi there,
>>
>> We are preparing a PEP and we would like to start some early discussion
>> about one of the main aspects of the PEP.
>>
>> The work we are preparing is to allow the interpreter to produce more
>> fine-grained error messages, pointing to
>> the source associated to the instructions that are failing. For example:
>>
>> Traceback (most recent call last):
>>
>>   File "test.py", line 14, in 
>>
>> lel3(x)
>>
>> ^^^
>>
>>   File "test.py", line 12, in lel3
>>
>> return lel2(x) / 23
>>
>>^^^
>>
>>   File "test.py", line 9, in lel2
>>
>> return 25 + lel(x) + lel(x)
>>
>> ^^
>>
>>   File "test.py", line 6, in lel
>>
>> return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>>
>>  ^
>>
>> TypeError: 'NoneType' object is not subscriptable
>>
>> The cost of this is having the start column number and end column number
>> information for every bytecode instruction
>> and this is what we want to discuss (there is also some stack cost to
>> re-raise exceptions but that's not a big problem in
>> any case). Given that column numbers are not very big compared with line
>> numbers, we plan to store these as unsigned chars
>> or unsigned shorts. We ran some experiments over the standard library and
>> we found that the overhead of all pyc files is:
>>
>> * If we use shorts, the total overhead is ~3% (total size 28MB and the
>> extra size is 0.88 MB).
>> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
>> extra size is 0.44MB).
>>
>> One of the disadvantages of using chars is that we can only report
>> columns from 1 to 255 so if an error happens in a column
>> bigger than that then we would have to exclude it (and not show the
>> highlighting) for that frame. Unsigned short will allow
>> the values to go from 0 to 65535.
>>
>> Unfortunately these numbers are not easily compressible, as every
>> instruction would have very different offsets.
>>
>> There is also the possibility of not doing this based on some build flag
>> on when using -O to allow users to opt out, but given the fact
>> that these numbers can be quite useful to other tools like coverage
>> measuring tools, tracers, profilers and the such adding conditional
>> logic to many places would complicate the implementation considerably and
>> will potentially reduce the usability of those tools so we prefer
>> not to have the conditional logic. We believe this is extra cost is very
>> much worth the better error reporting but we understand and respect
>> other points of view.
>>
>> Does anyone see a better way to encode this information **without
>> complicating a lot the implementation**? What are people thoughts on the
>> feature?
>>
>> Thanks in advance,
>>
>> Regards from cloudy London,
>> Pablo Galindo Salgado
>>
>>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/RICGTXCABZPK7RLDB7SISR4E64S6FEKR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Pablo Galindo Salgado
Some update on the numbers. We have made some draft implementation to
corroborate the
numbers with some more realistic tests and seems that our original
calculations were wrong.
The actual increase in size is quite bigger than previously advertised:

Using bytes object to encode the final object and marshalling that to disk
(so using uint8_t) as the underlying
type:

BEFORE:

❯ ./python -m compileall -r 1000 Lib > /dev/null
❯ du -h Lib -c --max-depth=0
70M Lib
70M total

AFTER:
❯ ./python -m compileall -r 1000 Lib > /dev/null
❯ du -h Lib -c --max-depth=0
76M Lib
76M total

So that's an increase of 8.56 % over the original value. This is storing
the start offset and end offset with no compression
whatsoever.

On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado 
wrote:

> Hi there,
>
> We are preparing a PEP and we would like to start some early discussion
> about one of the main aspects of the PEP.
>
> The work we are preparing is to allow the interpreter to produce more
> fine-grained error messages, pointing to
> the source associated to the instructions that are failing. For example:
>
> Traceback (most recent call last):
>
>   File "test.py", line 14, in 
>
> lel3(x)
>
> ^^^
>
>   File "test.py", line 12, in lel3
>
> return lel2(x) / 23
>
>^^^
>
>   File "test.py", line 9, in lel2
>
> return 25 + lel(x) + lel(x)
>
> ^^
>
>   File "test.py", line 6, in lel
>
> return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>
>  ^
>
> TypeError: 'NoneType' object is not subscriptable
>
> The cost of this is having the start column number and end column number
> information for every bytecode instruction
> and this is what we want to discuss (there is also some stack cost to
> re-raise exceptions but that's not a big problem in
> any case). Given that column numbers are not very big compared with line
> numbers, we plan to store these as unsigned chars
> or unsigned shorts. We ran some experiments over the standard library and
> we found that the overhead of all pyc files is:
>
> * If we use shorts, the total overhead is ~3% (total size 28MB and the
> extra size is 0.88 MB).
> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
> extra size is 0.44MB).
>
> One of the disadvantages of using chars is that we can only report columns
> from 1 to 255 so if an error happens in a column
> bigger than that then we would have to exclude it (and not show the
> highlighting) for that frame. Unsigned short will allow
> the values to go from 0 to 65535.
>
> Unfortunately these numbers are not easily compressible, as every
> instruction would have very different offsets.
>
> There is also the possibility of not doing this based on some build flag
> on when using -O to allow users to opt out, but given the fact
> that these numbers can be quite useful to other tools like coverage
> measuring tools, tracers, profilers and the such adding conditional
> logic to many places would complicate the implementation considerably and
> will potentially reduce the usability of those tools so we prefer
> not to have the conditional logic. We believe this is extra cost is very
> much worth the better error reporting but we understand and respect
> other points of view.
>
> Does anyone see a better way to encode this information **without
> complicating a lot the implementation**? What are people thoughts on the
> feature?
>
> Thanks in advance,
>
> Regards from cloudy London,
> Pablo Galindo Salgado
>
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QDEKMTZRMPEKPFFBPCGUYWLLR43A6M6U/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Nick Coghlan
On Sat, 8 May 2021, 8:53 am Pablo Galindo Salgado, 
wrote:

> > One thought: could the stored column position not include the
> indentation? Would that help?
>
> The compiler doesn't have access easy access to the source unfortunately
> so we don't know how much is the indentation. This can make life
> a bit harder for other tools, although it can make it easier for reporting
> the exception as the current traceback display removes indentation.
>


If the lnotab format (or a new data structure on the code object) could
store a line indent offset for each line, each instruction within a line
would only need to record the offset from the end of the indentation.

If we assume "deeply indented code" is the most likely source of
excessively long lines rather than "long expressions and other one line
statements produced by code generators" it may be worth it, but I'm not
sure that's actually true.

If we instead assume long lines are likely to come from code generators,
then we can impose the 255 column limit, and breaking lines at 255 code
points to improve tracebacks would become a quality of implementation issue
for code generators.

The latter assumption seems more likely to be true to me, and if the deep
indentation case does come up, the line offset idea could be pursued later.

Cheers,
Nick.


>
> On Fri, 7 May 2021 at 23:37, MRAB  wrote:
>
>> On 2021-05-07 22:45, Pablo Galindo Salgado wrote:
>> > Hi there,
>> >
>> > We are preparing a PEP and we would like to start some early discussion
>> > about one of the main aspects of the PEP.
>> >
>> > The work we are preparing is to allow the interpreter to produce more
>> > fine-grained error messages, pointing to
>> > the source associated to the instructions that are failing. For example:
>> >
>> > Traceback (most recent call last):
>> >
>> >File "test.py", line 14, in 
>> >
>> >  lel3(x)
>> >
>> >  ^^^
>> >
>> >File "test.py", line 12, in lel3
>> >
>> >  return lel2(x) / 23
>> >
>> > ^^^
>> >
>> >File "test.py", line 9, in lel2
>> >
>> >  return 25 + lel(x) + lel(x)
>> >
>> >  ^^
>> >
>> >File "test.py", line 6, in lel
>> >
>> >  return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>> >
>> >   ^
>> >
>> > TypeError: 'NoneType' object is not subscriptable
>> >
>> >
>> > The cost of this is having the start column number and end
>> column number
>> > information for every bytecode instruction
>> > and this is what we want to discuss (there is also some stack cost to
>> > re-raise exceptions but that's not a big problem in
>> > any case). Given that column numbers are not very big compared with
>> line
>> > numbers, we plan to store these as unsigned chars
>> > or unsigned shorts. We ran some experiments over the standard library
>> > and we found that the overhead of all pyc files is:
>> >
>> > * If we use shorts, the total overhead is ~3% (total size 28MB and the
>> > extra size is 0.88 MB).
>> > * If we use chars. the total overhead is ~1.5% (total size 28 MB and
>> the
>> > extra size is 0.44MB).
>> >
>> > One of the disadvantages of using chars is that we can only report
>> > columns from 1 to 255 so if an error happens in a column
>> > bigger than that then we would have to exclude it (and not show the
>> > highlighting) for that frame. Unsigned short will allow
>> > the values to go from 0 to 65535.
>> >
>> [snip]How common are lines are longer than 255 characters, anyway?
>>
>> One thought: could the stored column position not include the
>> indentation? Would that help?
>> ___
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/MHF3PMCJOR6VK765OSA7NSO66NY3QU3V/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/OKQYNAI2B2BRCFMYJPLYPG2HHHUB5QR6/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BA4UQ36VQGZ52BN56XPKFRPVO2TWD6BN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Pablo Galindo Salgado
> One thought: could the stored column position not include the
indentation? Would that help?

The compiler doesn't have access easy access to the source unfortunately so
we don't know how much is the indentation. This can make life
a bit harder for other tools, although it can make it easier for reporting
the exception as the current traceback display removes indentation.


On Fri, 7 May 2021 at 23:37, MRAB  wrote:

> On 2021-05-07 22:45, Pablo Galindo Salgado wrote:
> > Hi there,
> >
> > We are preparing a PEP and we would like to start some early discussion
> > about one of the main aspects of the PEP.
> >
> > The work we are preparing is to allow the interpreter to produce more
> > fine-grained error messages, pointing to
> > the source associated to the instructions that are failing. For example:
> >
> > Traceback (most recent call last):
> >
> >File "test.py", line 14, in 
> >
> >  lel3(x)
> >
> >  ^^^
> >
> >File "test.py", line 12, in lel3
> >
> >  return lel2(x) / 23
> >
> > ^^^
> >
> >File "test.py", line 9, in lel2
> >
> >  return 25 + lel(x) + lel(x)
> >
> >  ^^
> >
> >File "test.py", line 6, in lel
> >
> >  return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
> >
> >   ^
> >
> > TypeError: 'NoneType' object is not subscriptable
> >
> >
> > The cost of this is having the start column number and end column number
> > information for every bytecode instruction
> > and this is what we want to discuss (there is also some stack cost to
> > re-raise exceptions but that's not a big problem in
> > any case). Given that column numbers are not very big compared with line
> > numbers, we plan to store these as unsigned chars
> > or unsigned shorts. We ran some experiments over the standard library
> > and we found that the overhead of all pyc files is:
> >
> > * If we use shorts, the total overhead is ~3% (total size 28MB and the
> > extra size is 0.88 MB).
> > * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
> > extra size is 0.44MB).
> >
> > One of the disadvantages of using chars is that we can only report
> > columns from 1 to 255 so if an error happens in a column
> > bigger than that then we would have to exclude it (and not show the
> > highlighting) for that frame. Unsigned short will allow
> > the values to go from 0 to 65535.
> >
> [snip]How common are lines are longer than 255 characters, anyway?
>
> One thought: could the stored column position not include the
> indentation? Would that help?
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/MHF3PMCJOR6VK765OSA7NSO66NY3QU3V/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OKQYNAI2B2BRCFMYJPLYPG2HHHUB5QR6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Pablo Galindo Salgado
Thanks, Irit for your comment!

> Is it really every instruction? Or only those that can raise exceptions?

Technically only the ones that can raise exceptions, but the majority can
and optimizing this to only restrict to the set that can raise exceptions
has
the danger than the mapping needs to be maintained for new instructions and
that if some instruction starts raising exceptions while it didn't before
then it can introduce subtle bugs.

On the other hand I think the stronger argument to do this on every
instruction is that there is a lot of tools that can find this information
quite useful
such as coverage tools, profilers, state inspection tools and more. For
example, a coverage tool will be able to tell you what part of

x = f(x) if g(x) else y(x)

actually was executed, while currently, it will highlight the full line.
Although in this case these instructions can raise exceptions and would be
covered,
the distinction is different and both criteria could lead to a different
subset.

In short, that may be an optimization but I think I would prefer to avoid
that complexity taking into account the other problems that can raise and
the extra complication

On Fri, 7 May 2021 at 23:21, Irit Katriel 
wrote:

>
>
> On Fri, May 7, 2021 at 10:52 PM Pablo Galindo Salgado 
> wrote:
>
>>
>> The cost of this is having the start column number and end column number
>> information for every bytecode instruction
>>
>
>
> Is it really every instruction? Or only those that can raise exceptions?
>
>
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/W72ZOEIRWXSIY5OCGTBRSGHHKDXGZL6Z/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Pablo Galindo Salgado
> haha, true... Does our parser even have a maximum line length? (I'm not
suggesting being unlimited or matching that if huge, 64k is already
ridiculous

We use py_ssize_t in some places but at the end of the day the lines and
columns have a limit of INT_MAX IIRC


On Fri, 7 May 2021 at 23:35, Gregory P. Smith  wrote:

>
> On Fri, May 7, 2021 at 3:24 PM Pablo Galindo Salgado 
> wrote:
>
>> Thanks a lot Gregory for the comments!
>>
>> An additional cost to this is things that parse text tracebacks not
>>> knowing how to handle it and things that log tracebacks
>>> generating additional output.
>>
>> We should provide a way for people to disable the feature on a process as
>>> part of this while they address tooling and logging issues.  (via the usual
>>> set of command line flag + python env var + runtime API)
>>
>>
>> Absolutely! We were thinking about that and that's easy enough as that is
>> a single conditional on the display function + the extra init configuration.
>>
>> Neither of those is large. While I'd lean towards uint8_t instead of
>>> uint16_t because not even humans can understand a 255 character line so why
>>> bother being pretty about such a thing... Just document the caveat and move
>>> on with the lower value. A future pyc format could change it if a
>>> compelling argument were ever found.
>>
>>
>> I very much agree with you here but is worth noting that I have heard the
>> counter-argument that the longer the line is, the more important may be to
>> distinguish what part of the line is wrong.
>>
>
> haha, true... Does our parser even have a maximum line length? (I'm not
> suggesting being unlimited or matching that if huge, 64k is already
> ridiculous)
>
>
>>
>> A compromise if you want to handle longer lines: A single uint16_t.
>>> Represent the start column in the 9 bits and width in the other 7 bits. (or
>>> any variations thereof)  it's all a matter of what tradeoff you want to
>>> make for space reasons.  encoding as start + width instead of start + end
>>> is likely better anyways if you care about compression as the width byte
>>> will usually be small and thus be friendlier to compression.  I'd
>>> personally ignore compression entirely.
>>
>>
>> I would personally prefer not to implement very tricky compression
>> algorithms because tools may need to parse this and I don't want to
>> complicate the logic a lot. Handling lnotab is already a bit painful and
>> when bugs ocur it makes debugging very tricky. Having the possibility to
>> index something based on the index of the instruction is quite a good API
>> in my opinion.
>>
>> Overall doing this is going to be a big win for developer productivity!
>>
>>
>> Thanks! We think that this has a lot of potential indeed! :)
>>
>> Pablo
>>
>>
>>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/TNTBLEMSQ7JKZ2I75WJZSQHYIB6NSXCS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Pablo Galindo Salgado
This is actually a very good point. The only disadvantage is that it
complicates the parsing a bit and we loose the possibility of indexing
the table by instruction offset.

On Fri, 7 May 2021 at 23:01, Larry Hastings  wrote:

> On 5/7/21 2:45 PM, Pablo Galindo Salgado wrote:
>
> Given that column numbers are not very big compared with line numbers, we
> plan to store these as unsigned chars
> or unsigned shorts. We ran some experiments over the standard library and
> we found that the overhead of all pyc files is:
>
> * If we use shorts, the total overhead is ~3% (total size 28MB and the
> extra size is 0.88 MB).
> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
> extra size is 0.44MB).
>
> One of the disadvantages of using chars is that we can only report columns
> from 1 to 255 so if an error happens in a column
> bigger than that then we would have to exclude it (and not show the
> highlighting) for that frame. Unsigned short will allow
> the values to go from 0 to 65535.
>
> Are lnotab entries required to be a fixed size?  If not:
>
> if column < 255:
> lnotab.write_one_byte(column)
> else:
> lnotab.write_one_byte(255)
> lnotab.write_two_bytes(column)
>
>
> I might even write four bytes instead of two in the latter case,
>
>
> */arry*
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/B3SFCZPXIKGO3LM6UJVSJXFIRAZH2R26/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SJP2RIMEFEVWKBWOA2V2X4BMFGHHEZ5J/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Gregory P. Smith
On Fri, May 7, 2021 at 3:24 PM Pablo Galindo Salgado 
wrote:

> Thanks a lot Gregory for the comments!
>
> An additional cost to this is things that parse text tracebacks not
>> knowing how to handle it and things that log tracebacks
>> generating additional output.
>
> We should provide a way for people to disable the feature on a process as
>> part of this while they address tooling and logging issues.  (via the usual
>> set of command line flag + python env var + runtime API)
>
>
> Absolutely! We were thinking about that and that's easy enough as that is
> a single conditional on the display function + the extra init configuration.
>
> Neither of those is large. While I'd lean towards uint8_t instead of
>> uint16_t because not even humans can understand a 255 character line so why
>> bother being pretty about such a thing... Just document the caveat and move
>> on with the lower value. A future pyc format could change it if a
>> compelling argument were ever found.
>
>
> I very much agree with you here but is worth noting that I have heard the
> counter-argument that the longer the line is, the more important may be to
> distinguish what part of the line is wrong.
>

haha, true... Does our parser even have a maximum line length? (I'm not
suggesting being unlimited or matching that if huge, 64k is already
ridiculous)


>
> A compromise if you want to handle longer lines: A single uint16_t.
>> Represent the start column in the 9 bits and width in the other 7 bits. (or
>> any variations thereof)  it's all a matter of what tradeoff you want to
>> make for space reasons.  encoding as start + width instead of start + end
>> is likely better anyways if you care about compression as the width byte
>> will usually be small and thus be friendlier to compression.  I'd
>> personally ignore compression entirely.
>
>
> I would personally prefer not to implement very tricky compression
> algorithms because tools may need to parse this and I don't want to
> complicate the logic a lot. Handling lnotab is already a bit painful and
> when bugs ocur it makes debugging very tricky. Having the possibility to
> index something based on the index of the instruction is quite a good API
> in my opinion.
>
> Overall doing this is going to be a big win for developer productivity!
>
>
> Thanks! We think that this has a lot of potential indeed! :)
>
> Pablo
>
>
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/E7OM3GA4GNMRXAXOFAIZCCNTBWFUJAEP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread MRAB

On 2021-05-07 22:56, Larry Hastings wrote:

On 5/7/21 2:45 PM, Pablo Galindo Salgado wrote:
Given that column numbers are not very big compared with line numbers, 
we plan to store these as unsigned chars
or unsigned shorts. We ran some experiments over the standard library 
and we found that the overhead of all pyc files is:


* If we use shorts, the total overhead is ~3% (total size 28MB and the 
extra size is 0.88 MB).
* If we use chars. the total overhead is ~1.5% (total size 28 MB and 
the extra size is 0.44MB).


One of the disadvantages of using chars is that we can only report 
columns from 1 to 255 so if an error happens in a column
bigger than that then we would have to exclude it (and not show the 
highlighting) for that frame. Unsigned short will allow

the values to go from 0 to 65535.


Are lnotab entries required to be a fixed size?  If not:

if column < 255:
     lnotab.write_one_byte(column)
else:
     lnotab.write_one_byte(255)
     lnotab.write_two_bytes(column)


I might even write four bytes instead of two in the latter case,


A slight improvement would be:

if column < 255:
 lnotab.write_one_byte(column)
else:
 lnotab.write_one_byte(255)
 lnotab.write_two_bytes(column - 255)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UJYYDMXCM5TM7GOSIPMK7GOWNC25GL7W/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread MRAB

On 2021-05-07 22:45, Pablo Galindo Salgado wrote:

Hi there,

We are preparing a PEP and we would like to start some early discussion 
about one of the main aspects of the PEP.


The work we are preparing is to allow the interpreter to produce more 
fine-grained error messages, pointing to

the source associated to the instructions that are failing. For example:

Traceback (most recent call last):

   File "test.py", line 14, in 

 lel3(x)

 ^^^

   File "test.py", line 12, in lel3

 return lel2(x) / 23

    ^^^

   File "test.py", line 9, in lel2

 return 25 + lel(x) + lel(x)

 ^^

   File "test.py", line 6, in lel

 return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)

  ^

TypeError: 'NoneType' object is not subscriptable


The cost of this is having the start column number and end column number 
information for every bytecode instruction
and this is what we want to discuss (there is also some stack cost to 
re-raise exceptions but that's not a big problem in
any case). Given that column numbers are not very big compared with line 
numbers, we plan to store these as unsigned chars
or unsigned shorts. We ran some experiments over the standard library 
and we found that the overhead of all pyc files is:


* If we use shorts, the total overhead is ~3% (total size 28MB and the 
extra size is 0.88 MB).
* If we use chars. the total overhead is ~1.5% (total size 28 MB and the 
extra size is 0.44MB).


One of the disadvantages of using chars is that we can only report 
columns from 1 to 255 so if an error happens in a column
bigger than that then we would have to exclude it (and not show the 
highlighting) for that frame. Unsigned short will allow

the values to go from 0 to 65535.


[snip]How common are lines are longer than 255 characters, anyway?

One thought: could the stored column position not include the 
indentation? Would that help?

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MHF3PMCJOR6VK765OSA7NSO66NY3QU3V/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Pablo Galindo Salgado
Thanks a lot Gregory for the comments!

An additional cost to this is things that parse text tracebacks not knowing
> how to handle it and things that log tracebacks generating additional
> output.

We should provide a way for people to disable the feature on a process as
> part of this while they address tooling and logging issues.  (via the usual
> set of command line flag + python env var + runtime API)


Absolutely! We were thinking about that and that's easy enough as that is a
single conditional on the display function + the extra init configuration.

Neither of those is large. While I'd lean towards uint8_t instead of
> uint16_t because not even humans can understand a 255 character line so why
> bother being pretty about such a thing... Just document the caveat and move
> on with the lower value. A future pyc format could change it if a
> compelling argument were ever found.


I very much agree with you here but is worth noting that I have heard the
counter-argument that the longer the line is, the more important may be to
distinguish what part of the line is wrong.

A compromise if you want to handle longer lines: A single uint16_t.
> Represent the start column in the 9 bits and width in the other 7 bits. (or
> any variations thereof)  it's all a matter of what tradeoff you want to
> make for space reasons.  encoding as start + width instead of start + end
> is likely better anyways if you care about compression as the width byte
> will usually be small and thus be friendlier to compression.  I'd
> personally ignore compression entirely.


I would personally prefer not to implement very tricky compression
algorithms because tools may need to parse this and I don't want to
complicate the logic a lot. Handling lnotab is already a bit painful and
when bugs ocur it makes debugging very tricky. Having the possibility to
index something based on the index of the instruction is quite a good API
in my opinion.

Overall doing this is going to be a big win for developer productivity!


Thanks! We think that this has a lot of potential indeed! :)

Pablo
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OHSQ6VLMVSZHCLEUQZ52NXCWEGLG2DQN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Gregory P. Smith
On Fri, May 7, 2021 at 2:50 PM Pablo Galindo Salgado 
wrote:

> Hi there,
>
> We are preparing a PEP and we would like to start some early discussion
> about one of the main aspects of the PEP.
>
> The work we are preparing is to allow the interpreter to produce more
> fine-grained error messages, pointing to
> the source associated to the instructions that are failing. For example:
>
> Traceback (most recent call last):
>
>   File "test.py", line 14, in 
>
> lel3(x)
>
> ^^^
>
>   File "test.py", line 12, in lel3
>
> return lel2(x) / 23
>
>^^^
>
>   File "test.py", line 9, in lel2
>
> return 25 + lel(x) + lel(x)
>
> ^^
>
>   File "test.py", line 6, in lel
>
> return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>
>  ^
>
> TypeError: 'NoneType' object is not subscriptable
>
>
An additional cost to this is things that parse text tracebacks not knowing
how to handle it and things that log tracebacks generating additional
output.  We should provide a way for people to disable the feature on a
process as part of this while they address tooling and logging issues.
(via the usual set of command line flag + python env var + runtime API)

The cost of this is having the start column number and end column number
> information for every bytecode instruction
> and this is what we want to discuss (there is also some stack cost to
> re-raise exceptions but that's not a big problem in
> any case). Given that column numbers are not very big compared with line
> numbers, we plan to store these as unsigned chars
> or unsigned shorts. We ran some experiments over the standard library and
> we found that the overhead of all pyc files is:
>
> * If we use shorts, the total overhead is ~3% (total size 28MB and the
> extra size is 0.88 MB).
> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
> extra size is 0.44MB).
>
> One of the disadvantages of using chars is that we can only report columns
> from 1 to 255 so if an error happens in a column
> bigger than that then we would have to exclude it (and not show the
> highlighting) for that frame. Unsigned short will allow
> the values to go from 0 to 65535.
>

Neither of those is large. While I'd lean towards uint8_t instead of
uint16_t because not even humans can understand a 255 character line so why
bother being pretty about such a thing... Just document the caveat and move
on with the lower value. A future pyc format could change it if a
compelling argument were ever found.


> Unfortunately these numbers are not easily compressible, as every
> instruction would have very different offsets.
>
> There is also the possibility of not doing this based on some build flag
> on when using -O to allow users to opt out, but given the fact
> that these numbers can be quite useful to other tools like coverage
> measuring tools, tracers, profilers and the such adding conditional
> logic to many places would complicate the implementation considerably and
> will potentially reduce the usability of those tools so we prefer
> not to have the conditional logic. We believe this is extra cost is very
> much worth the better error reporting but we understand and respect
> other points of view.
>
> Does anyone see a better way to encode this information **without
> complicating a lot the implementation**? What are people thoughts on the
> feature?
>

A compromise if you want to handle longer lines: A single uint16_t.
Represent the start column in the 9 bits and width in the other 7 bits. (or
any variations thereof)  it's all a matter of what tradeoff you want to
make for space reasons.  encoding as start + width instead of start + end
is likely better anyways if you care about compression as the width byte
will usually be small and thus be friendlier to compression.  I'd
personally ignore compression entirely.

Overall doing this is going to be a big win for developer productivity!

-Greg
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ULNDFY5CWVDELNPE6S4HY5SDAODOT7DC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Irit Katriel via Python-Dev
On Fri, May 7, 2021 at 10:52 PM Pablo Galindo Salgado 
wrote:

>
> The cost of this is having the start column number and end column number
> information for every bytecode instruction
>


Is it really every instruction? Or only those that can raise exceptions?
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UL5XAR2MGYJPIKB67R56OJXUVKP2KM3H/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Pablo Galindo Salgado
Technically the main concern may be the size of the unmarshalled pyc files
in memory, more than the storage size of disk.

On Fri, 7 May 2021, 23:04 Antoine Pitrou,  wrote:

> On Fri, 7 May 2021 22:45:38 +0100
> Pablo Galindo Salgado  wrote:
> >
> > The cost of this is having the start column number and end column number
> > information for every bytecode instruction
> > and this is what we want to discuss (there is also some stack cost to
> > re-raise exceptions but that's not a big problem in
> > any case). Given that column numbers are not very big compared with line
> > numbers, we plan to store these as unsigned chars
> > or unsigned shorts. We ran some experiments over the standard library and
> > we found that the overhead of all pyc files is:
> >
> > * If we use shorts, the total overhead is ~3% (total size 28MB and the
> > extra size is 0.88 MB).
> > * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
> > extra size is 0.44MB).
>
> More generally, if some people in 2021 are still concerned with the size
> of pyc files (why not), how about introducing a new version of the pyc
> format with built-in LZ4 compression?
>
> LZ4 decompression is extremely fast on modern CPUs (several GB/s) and
> vendoring the C library should be simple.
> https://github.com/lz4/lz4
>
> Regards
>
> Antoine.
>
>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/PQZ6OTWG6K6W65YXRLKEH7UOD5FM24TN/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SNDIHPBDW4Y3KGSOEL7MBJER3IEBIFTN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Gregory P. Smith
On Fri, May 7, 2021 at 3:01 PM Larry Hastings  wrote:

> On 5/7/21 2:45 PM, Pablo Galindo Salgado wrote:
>
> Given that column numbers are not very big compared with line numbers, we
> plan to store these as unsigned chars
> or unsigned shorts. We ran some experiments over the standard library and
> we found that the overhead of all pyc files is:
>
> * If we use shorts, the total overhead is ~3% (total size 28MB and the
> extra size is 0.88 MB).
> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
> extra size is 0.44MB).
>
> One of the disadvantages of using chars is that we can only report columns
> from 1 to 255 so if an error happens in a column
> bigger than that then we would have to exclude it (and not show the
> highlighting) for that frame. Unsigned short will allow
> the values to go from 0 to 65535.
>
> Are lnotab entries required to be a fixed size?  If not:
>
> if column < 255:
> lnotab.write_one_byte(column)
> else:
> lnotab.write_one_byte(255)
> lnotab.write_two_bytes(column)
>
> If non-fixed size is acceptable. use utf-8 to encode the column number as
a single codepoint number into bytes and you don't even need to write your
own encode/decode logic for a varint.

-gps
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QNWOZWTNFAVPD77KNG4LRYWCEDY3F6HX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Antoine Pitrou
On Fri, 7 May 2021 22:45:38 +0100
Pablo Galindo Salgado  wrote:
> 
> The cost of this is having the start column number and end column number
> information for every bytecode instruction
> and this is what we want to discuss (there is also some stack cost to
> re-raise exceptions but that's not a big problem in
> any case). Given that column numbers are not very big compared with line
> numbers, we plan to store these as unsigned chars
> or unsigned shorts. We ran some experiments over the standard library and
> we found that the overhead of all pyc files is:
> 
> * If we use shorts, the total overhead is ~3% (total size 28MB and the
> extra size is 0.88 MB).
> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
> extra size is 0.44MB).

More generally, if some people in 2021 are still concerned with the size
of pyc files (why not), how about introducing a new version of the pyc
format with built-in LZ4 compression?

LZ4 decompression is extremely fast on modern CPUs (several GB/s) and
vendoring the C library should be simple.
https://github.com/lz4/lz4

Regards

Antoine.


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PQZ6OTWG6K6W65YXRLKEH7UOD5FM24TN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Larry Hastings

On 5/7/21 2:45 PM, Pablo Galindo Salgado wrote:
Given that column numbers are not very big compared with line numbers, 
we plan to store these as unsigned chars
or unsigned shorts. We ran some experiments over the standard library 
and we found that the overhead of all pyc files is:


* If we use shorts, the total overhead is ~3% (total size 28MB and the 
extra size is 0.88 MB).
* If we use chars. the total overhead is ~1.5% (total size 28 MB and 
the extra size is 0.44MB).


One of the disadvantages of using chars is that we can only report 
columns from 1 to 255 so if an error happens in a column
bigger than that then we would have to exclude it (and not show the 
highlighting) for that frame. Unsigned short will allow

the values to go from 0 to 65535.


Are lnotab entries required to be a fixed size?  If not:

   if column < 255:
    lnotab.write_one_byte(column)
   else:
    lnotab.write_one_byte(255)
    lnotab.write_two_bytes(column)


I might even write four bytes instead of two in the latter case,


//arry/

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/B3SFCZPXIKGO3LM6UJVSJXFIRAZH2R26/
Code of Conduct: http://python.org/psf/codeofconduct/