[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-02 Thread Stephen J. Turnbull
Jim J. Jewett writes: > At the time, we considered it, and we also considered a narrower > restriction on using multiple scripts in the same identifier, or at > least the same identifier portion (so it was OK if separated by > _). This would ban "παν語", aka "pango". That's arguably a good

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-02 Thread Stephen J. Turnbull
Serhiy Storchaka writes: > All control characters except CR, LF, TAB and FF are banned outside > comments and string literals. I think it is worth to ban them in > comments and string literals too. +1 > > For homoglyphs/confusables, should there be a SyntaxWarning when an > > identifier

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-02 Thread Stephen J. Turnbull
Serhiy Storchaka writes: > This is excellent! > > 01.11.21 14:17, Petr Viktorin пише: > >> CPython treats the control character NUL (``\0``) as end of input, > >> but many editors simply skip it, possibly showing code that Python > >> will not > >> run as a regular part of a file. > >

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-02 Thread Jim J. Jewett
Chris Angelico wrote: > I'm not sure how a linter would stop > someone from publishing code on PyPI that causes confusion by its > character encoding, for instance. If it becomes important, the cheeseshop backend can run various validations (including a linter) on submissions, and include those

[Python-Dev] PEP 663:

2021-11-02 Thread Ethan Furman
See the latest changes, which are mostly a (hopefully) improved abstract, better tables, and some slight rewordings. Feedback welcome! --- PEP: 663 Title: Standardizing Enum str(), repr(), and format() behaviors Version: $Revision$ Last-Modified: $Date$ Author:

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-02 Thread Kyle Stanley
I'd suggest both: briefer, easier to read write up for average user in docs, more details/semantics in informational PEP. Thanks for working on this, Petr! On Tue, Nov 2, 2021 at 2:07 PM David Mertz, Ph.D. wrote: > This is an amazing document, Petr. Really great work! > > I think I agree with

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-02 Thread Chris Angelico
On Wed, Nov 3, 2021 at 11:09 AM Steven D'Aprano wrote: > > On Wed, Nov 03, 2021 at 03:03:54AM +1100, Chris Angelico wrote: > > On Wed, Nov 3, 2021 at 1:06 AM Petr Viktorin wrote: > > > Let me know if it's clear in the newest version, with this note: > > > > > > > Here, ``encoding:

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-02 Thread Steven D'Aprano
On Wed, Nov 03, 2021 at 03:03:54AM +1100, Chris Angelico wrote: > On Wed, Nov 3, 2021 at 1:06 AM Petr Viktorin wrote: > > Let me know if it's clear in the newest version, with this note: > > > > > Here, ``encoding: unicode_escape`` in the initial comment is an encoding > > > declaration. The

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-02 Thread Terry Reedy
On 11/2/2021 1:02 PM, Marc-Andre Lemburg wrote: On 01.11.2021 13:17, Petr Viktorin wrote: PEP: Title: Unicode Security Considerations for Python Author: Petr Viktorin Status: Active Type: Informational Content-Type: text/x-rst Created: 01-Nov-2021 Post-History: Thanks for writing this

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-02 Thread Chris Angelico
On Wed, Nov 3, 2021 at 5:07 AM David Mertz, Ph.D. wrote: > > This is an amazing document, Petr. Really great work! > > I think I agree with Marc-André that putting it in the actual Python > documentation would give it more visibility than in a PEP. > There are quite a few other PEPs that have

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-02 Thread David Mertz, Ph.D.
This is an amazing document, Petr. Really great work! I think I agree with Marc-André that putting it in the actual Python documentation would give it more visibility than in a PEP. On Tue, Nov 2, 2021, 1:06 PM Marc-Andre Lemburg wrote: > On 01.11.2021 13:17, Petr Viktorin wrote: > >> PEP:

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-02 Thread Marc-Andre Lemburg
On 01.11.2021 13:17, Petr Viktorin wrote: >> PEP: >> Title: Unicode Security Considerations for Python >> Author: Petr Viktorin >> Status: Active >> Type: Informational >> Content-Type: text/x-rst >> Created: 01-Nov-2021 >> Post-History: Thanks for writing this up. I'm not sure whether a

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-02 Thread Jim J. Jewett
Serhiy Storchaka wrote: > 02.11.21 16:16, Petr Viktorin пише: > > As for \0, can we ban all ASCII & C1 control characters except > > whitespace? I see no place for them in source code. > All control characters except CR, LF, TAB and FF are banned outside > comments and string literals. I think it

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-02 Thread Chris Angelico
On Wed, Nov 3, 2021 at 1:06 AM Petr Viktorin wrote: > Let me know if it's clear in the newest version, with this note: > > > Here, ``encoding: unicode_escape`` in the initial comment is an encoding > > declaration. The ``unicode_escape`` encoding instructs Python to treat > > ``\u0027`` as a

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-02 Thread Serhiy Storchaka
02.11.21 16:16, Petr Viktorin пише: > As for \0, can we ban all ASCII & C1 control characters except > whitespace? I see no place for them in source code. All control characters except CR, LF, TAB and FF are banned outside comments and string literals. I think it is worth to ban them in comments

[Python-Dev] Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-02 Thread Petr Viktorin
On 01. 11. 21 18:32, Serhiy Storchaka wrote: This is excellent! 01.11.21 14:17, Petr Viktorin пише: CPython treats the control character NUL (``\0``) as end of input, but many editors simply skip it, possibly showing code that Python will not run as a regular part of a file. It is an

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-02 Thread Petr Viktorin
On 01. 11. 21 13:17, Petr Viktorin wrote: Hello, Today, an attack called "Trojan source" was revealed, where a malicious contributor can use Unicode features (left-to-right text and homoglyphs) to code that, when shown in an editor, will look different from how a computer language parser will