[Python-Dev] Re: Do we need to remove everything that's deprecated?

2021-11-14 Thread Christopher Barker
On Sun, Nov 14, 2021 at 10:06 PM Stephen J. Turnbull <
stephenjturnb...@gmail.com> wrote:

> I'm not saying *Python* can't remove anything.  I'm saying downstream,
> *GNU Mailman* has users it *may* want to support.
>

So a project (not to pick on Mailman) may want to support its users
running old versions of the code on new versions of Python?

I've been confused about that kind of thing for years -- e.g. numpy has to
support Python versions from ten years ago so users can get the latest
numpy while running a really old version of Python? I understand that there
are sometimes IT policy issues, like "you have to use the system Python, on
an old system, but you can still install the latest Python packages' '. And
believe me, I work in an institution with many such irrational policies,
but they are,indeed, irrational, and sometimes I can use the ammunition of
saying, no, I can not run this application on that old OS. Period -- give
me an updated system to get my job done.

Unlike Petr, I'm not <0 on removals.  But they are costly, keeping up
> with deprecations is costly.


absolutely, and hopefully that cost was considered when the depreciation is
made -- we shouldn't second guess it when it's time to actually remove the
old stuff.

 In this thread, I'm most interested in exploring
> tooling to make it easier for those who express reservations about
> removals.
>

Maybe reviving the future package to cover Python 3 changes. It appears not
to have been updated for two years, which makes sense, as Py2 is no longer
supported. But maybe its infrastructure could be updated to accommodate the
newer changes. It should be an easier job than making a 2-3 compatible
codebase.

I found future really helpful in the 2-3 transition, and one nice thing
about it is that it provided both translation ala 2to3 and
compatibility ala six.

-CHB


-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UXMR6X3ZBK3QJZXAXXQVEC52GJAYGL2Y/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-14 Thread Christopher Barker
On Sun, Nov 14, 2021 at 4:53 PM Steven D'Aprano  wrote:

> Out of all the approximately thousand bazillion ways to write obfuscated
> Python code, which may or may not be malicious, why are Unicode
> confusables worth this level of angst and concern?
>

I for one am not full of angst nor particularly concerned. Though ti's a
fine idea to inform folks about h this issues.

I am, however, surprised and disappointed by the NKFC normalization.

For example, in writing math we often use different scripts to mean
different things (e.g. TeX's
Blackboard Bold). So if I were to use some of the Unicode Mathematical
Alphanumeric Symbols, I wouldn't want them to get normalized.

Then there's the question of when this normalization happens (and when it
doesn't). If one is doing any kind of metaprogramming, even just using
getattr() and setattr(), things could get very confusing:

In [55]: class Junk:
...: 헵e퓵픩º = "hello"
...:

In [56]: setattr(Junk, "ᵖ햗퐢혯퓽", "print")

In [57]: dir(Junk)
Out[57]:
 '__weakref__',

 'hello',
 'ᵖ햗퐢혯퓽']

In [58]: Junk.hello
Out[58]: 'hello'

In [59]: Junk.헵e퓵픩º
Out[59]: 'hello'

In [60]: Junk.print
---
AttributeErrorTraceback (most recent call last)
 in 
> 1 Junk.print

AttributeError: type object 'Junk' has no attribute 'print'

In [61]: Junk.ᵖ햗퐢혯퓽
---
AttributeErrorTraceback (most recent call last)
 in 
> 1 Junk.ᵖ햗퐢혯퓽

AttributeError: type object 'Junk' has no attribute 'print'

In [62]: getattr(Junk, "ᵖ햗퐢혯퓽")
Out[62]: 'print'

Would a proposal to switch the normalization to NFC only have any hope of
being accepted?

and/or adding normaliztion to setattr() and maybe other places where names
are set in code?

-CHB

-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/Z2AIS6Y6NVNF5QSD7GMTB76NSP6NAIKV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Do we need to remove everything that's deprecated?

2021-11-14 Thread Stephen J. Turnbull
Christopher Barker writes:
 > On Sun, Nov 14, 2021 at 8:19 AM Stephen J. Turnbull <
 > stephenjturnb...@gmail.com> wrote:
 > 
 > >  > But do
 > >  > we need to support running the same code on 3.5 to 3.10?
 > >
 > > Need?  No.  Want to not raise a big middle finger to our users?
 > 
 > Note that I said 3.5, not 3.6 -- 3.5 is no longer supported. If we feel the
 > need to be backward compatible with unsupported versions that we can't ever
 > remove anything.

I'm not saying *Python* can't remove anything.  I'm saying downstream,
*GNU Mailman* has users it *may* want to support.

Unlike Petr, I'm not <0 on removals.  But they are costly, keeping up
with deprecations is costly.  In theory, I agree with you that we
should consider maintenance as an almost-fixed cost that a few
deprecations aren't going to increase significantly.  In practice, I'm
not practiced enough to say.

I do see a strong case for pruning stuff that we already found worth
deprecation as well.  In this thread, I'm most interested in exploring
tooling to make it easier for those who express reservations about
removals.

Steve


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5YTXKZQT4IQSZDLFGT4TRFHIOYHCYITU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Do we need to remove everything that's deprecated?

2021-11-14 Thread Inada Naoki
On Mon, Nov 15, 2021 at 7:58 AM Victor Stinner  wrote:
>
> On Sun, Nov 14, 2021 at 6:34 PM Eric V. Smith  wrote:
> > On second thought, I guess the existing policy already does this. Maybe
> > we should make it more than 2 versions for deprecations? I've written
> > libraries where I support 4 or 5 released versions. Although maybe I
> > should just trim that back.
>
> If I understood correctly, the problem is more for how long is the new
> way available?
>

I think the main problem is how many user code will be broken and the
merit of the deletion.

For example, PEP 623 will remove some legacy C APIs in Python 3.12.
https://www.python.org/dev/peps/pep-0623/

There are a few modules the PEP will break. But the PEP has
significant merit (reduce memory usage of all string objects).
So I want to remove them with the minimum deprecation period and I am
helping people to use new APIs. (*)

* e.g. https://github.com/jamesturk/cjellyfish/pull/12

So I don't want to increase the minimum required deprecation period.
But I agree that a longer deprecation period is good when keeping
deprecation stuff has nearly zero cost.

Regards,

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/3UE3SNH3DG5HE22EZ57NM5BFJ7ZANUJC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-14 Thread Steven D'Aprano
Out of all the approximately thousand bazillion ways to write obfuscated 
Python code, which may or may not be malicious, why are Unicode 
confusables worth this level of angst and concern?

I looked up "Unicode homoglyph" on CVE, and found a grand total of seven 
hits:

https://www.cvedetails.com/google-search-results.php?q=unicode+homoglyph

all of which appear to be related to impersonation of account names. I 
daresay if I expanded my search terms, I would probably find some more, 
but it is clear that Unicode homoglyphs are not exactly a major threat.

In my opinion, the other Steve's (Stestagg) example of obfuscated code 
with homoglyphs for e (as well as a few similar cases, such as 
homoglyphs for A) mostly makes for an amusing curiosity, perhaps worth a 
plugin for Pylint and other static checkers, but not much more. I'm not 
entirely sure what Paul's more lurid examples are supposed to indicate. 
If your threat relies on a malicious coder smuggling in identifiers like 
"횑퓮햑풍표" or "ªº" and having the reader not notice, then I'm not going to 
lose much sleep over it.

Confusable account names and URL spoofing are proven, genuine threats. 
Beyond that, IMO the actual threat window from confusables is pretty 
small. Yes, you can write obfuscated code, and smuggle in calls to 
unexpected functions:

result = lеn(sequence)  # Cyrillic letter small Ie

but you still have to smuggle in a function to make it work:

def lеn(obj):
# something malicious

And if you can do that, the Unicode letter is redundant. I'm not sure 
why any attacker would bother.


-- 
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XNRW6JSFGO4DQOGVNY2FEZAUBN6P2HRR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Do we need to remove everything that's deprecated?

2021-11-14 Thread Victor Stinner
On Sun, Nov 14, 2021 at 6:34 PM Eric V. Smith  wrote:
> On second thought, I guess the existing policy already does this. Maybe
> we should make it more than 2 versions for deprecations? I've written
> libraries where I support 4 or 5 released versions. Although maybe I
> should just trim that back.

If I understood correctly, the problem is more for how long is the new
way available?

For example, if the new way is introduced in Python 3.6, the old way
is deprecated is Python 3.8, can we remove the old way in Python 3.10?
It means that the new way is available in 4 versions (3.6, 3.7, 3.8,
3.9), before the old way is removed. It means that it's possible to
have a single code base (no test on the Python version and no feature
test) for Python 3.6 and newer.

More concrete examples:

* the "U" open() flag was deprecated since Python 3.0, removed in
Python 3.11: the flag was ignored since Python 3.0, code without "U"
works on Python 3.0 and newer

* collections.abc.MutableMapping exists since Python 3.3:
collections.MutableMapping was deprecated in Python 3.3, removed in
Python 3.10. Using collections.abc.MutableMapping works on Python 3.3
and newer.

* unittest: failIf() alias, deprecated since Python 2.7, was removed
in Python 3.11: assertFalse() always worked.

For these 3 changes, it's possible to keep support up to Python 3.3.
Up to Python 3.0 if you add "try/except ImportError" for
collections.abc.

IMO it would help to have a six-like module to write code for the
latest Python version, and keep support for old Python versions. For
example, have hacks to be able to use collections.abc.MutableMapping
on Python 3.2 and older (extreme example, who still care about Python
older than 3.5 in 2021?).

I wrote something like that for the C API, provide *new* C API
functions to *old* Python versions:
https://github.com/pythoncapi/pythoncapi_compat

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WD6NLGVI5AXB3POKQHOUKZ5WUR2HBLV2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-14 Thread Richard Damon

On 11/14/21 2:36 PM, David Mertz, Ph.D. wrote:

On Sun, Nov 14, 2021, 2:14 PM Christopher Barker

It's probably to deal with "é" vs "é", i.e. "\N{LATIN SMALL
LETTER
E}\N{COMBINING ACUTE ACCENT}" vs "\N{LATIN SMALL LETTER E WITH
ACUTE}",
which are different ways of writing the same thing.


Why does someone that wants to use, .e.g. "é" in an identifier
have to be able to represent it two different ways in a code file?


Imagine that two different programmers work with the same code base, 
and their text editors or keystrokes enter "é" in different ways.


Or imagine just one programmer doing so on two different 
machines/environments.


As an example, I wrote this reply on my Android tablet (with 
such-and-such OS version). I have no idea what actual codepoint(s) are 
entered when I press and hold the "e" key for a couple seconds to pop 
up character variations.


If I wrote it on OSX, I'd probably press "alt-e e" on my US 
International key layout. Again, no idea what codepoints actually are 
entered. If I did it on Linux, I'd use "ctrl-shift u 00e9". In that 
case, I actually know the codepoint.


But would have to look up the actual number to enter them.

Imagine of ALL your source code had to be entered via code-point numbers.

BTW, you should be able to enable 'composing' under Linux too, just like 
under OSX with the right input driver loaded.


--
Richard Damon

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/N76K3RML5QIFW56BRRVUOW5HGKSJAIVA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-14 Thread David Mertz, Ph.D.
On Sun, Nov 14, 2021, 2:14 PM Christopher Barker

> It's probably to deal with "é" vs "é", i.e. "\N{LATIN SMALL LETTER
>> E}\N{COMBINING ACUTE ACCENT}" vs "\N{LATIN SMALL LETTER E WITH ACUTE}",
>> which are different ways of writing the same thing.
>>
>
> Why does someone that wants to use, .e.g. "é" in an identifier have to be
> able to represent it two different ways in a code file?
>

Imagine that two different programmers work with the same code base, and
their text editors or keystrokes enter "é" in different ways.

Or imagine just one programmer doing so on two different
machines/environments.

As an example, I wrote this reply on my Android tablet (with such-and-such
OS version). I have no idea what actual codepoint(s) are entered when I
press and hold the "e" key for a couple seconds to pop up character
variations.

If I wrote it on OSX, I'd probably press "alt-e e" on my US International
key layout. Again, no idea what codepoints actually are entered. If I did
it on Linux, I'd use "ctrl-shift u 00e9". In that case, I actually know the
codepoint.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/F2GZT7YJTIZWTCPQXDSPVQICE3YK2TZ5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-14 Thread Richard Damon

On 11/14/21 2:07 PM, Christopher Barker wrote:
Why does someone that wants to use, .e.g. "é" in an identifier have 
to be able to represent it two different ways in a code file?


The issue here is that fundamentally, some editors will produce composed 
characters and some decomposed characters to represent the same actual 
'character'


These two methods are defined by Unicode to really represent the same 
'character', it is just that some defined sequences of combining 
codepoints just happen to have a composed 'abbreviation' defined also.


Having to exactly match the byte sequence says that some people will 
have a VERY hard time entering usable code if there tools support 
Unicode, but use the other convention.


--
Richard Damon

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WXGHMDIAY2M77MUMBM4NU7LZTIQTEBNP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-14 Thread Daniel Pope
On Sun, 14 Nov 2021, 19:07 Christopher Barker,  wrote:

> On Sun, Nov 14, 2021 at 10:27 AM MRAB  wrote:
>
>> Unfortunately, it goes too far, because it's unlikely that we want "ᵖ"
>> ("\N{MODIFIER LETTER SMALL P}') to be equivalent to "P" ("\N{LATIN
>> CAPITAL LETTER P}".
>>
>
> Is it possible to only capture things like the combining characters and
> not the "equivalent" ones like the above?
>

Yes, that is NFC. NKFC converts to equivalent characters and also composes;
NFC just composes.

>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YX5YJBQH4CIAF6GIIE7L54GGWLPAGVGB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-14 Thread Christopher Barker
On Sun, Nov 14, 2021 at 10:27 AM MRAB  wrote:

> > So why does Python apply  NFKC normalization to variable names??



> It's probably to deal with "é" vs "é", i.e. "\N{LATIN SMALL LETTER
> E}\N{COMBINING ACUTE ACCENT}" vs "\N{LATIN SMALL LETTER E WITH ACUTE}",
> which are different ways of writing the same thing.
>

sure, but this is code, written by humans (or meta-programming). Maybe I'm
showing my english bias, but would it be that limiting to have identifiers
be based on codepoints, period?

Why does someone that wants to use, .e.g. "é" in an identifier have to be
able to represent it two different ways in a code file?

But if so ...


> Unfortunately, it goes too far, because it's unlikely that we want "ᵖ"
> ("\N{MODIFIER LETTER SMALL P}') to be equivalent to "P" ("\N{LATIN
> CAPITAL LETTER P}".
>

Is it possible to only capture things like the combining characters and not
the "equivalent" ones like the above?

-CHB

-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QAR3TNRPNW7OXTGWKBDZHNVRKZGMCFZS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-14 Thread Alex Martelli via Python-Dev
Indeed, normative annex https://www.unicode.org/reports/tr31/tr31-35.html
section 5 says: "if the programming language has case-sensitive
identifiers, then Normalization Form C is appropriate" (vs NFKC for a
language with case-insensitive identifiers) so to follow the standard we
should have used NFC rather than NFKC. Not sure if it's too late to fix
this "oops" in future Python versions.

Alex

On Sun, Nov 14, 2021 at 9:17 AM Christopher Barker 
wrote:

> On Sat, Nov 13, 2021 at 2:03 PM  wrote:
>
>> def 횑퓮햑풍표():
>>
>> try:
>>
>> 픥e헅핝횘︴ = "Hello"
>>
>> 함픬r퓵ᵈ﹎ = "World"
>>
>> ᵖ햗퐢혯퓽(f"{헵e퓵픩º_}, {햜ₒ풓lⅆ︴}!")
>>
>> except 퓣핪ᵖe햤헿ᵣ햔횛 as ⅇ헑c:
>>
>> 풑rℹₙₜ("failed: {}".핗헼ʳᵐªt(ᵉ퐱퓬))
>>
>
> Wow. Just Wow.
>
> So why does Python apply  NFKC normalization to variable names?? I can't
> for the life of me figure out why that would be helpful at all.
>
> The string methods, sure, but names?
>
> And, in fact, the normalization is not used for string comparisons or
> hashes as far as I can tell.
>
> In [36]: weird
> Out[36]: 'ᵖ햗퐢혯퓽'
>
> In [37]: normal
> Out[37]: 'print'
>
> In [38]: eval(weird + "('yup, that worked')")
> yup, that worked
>
> In [39]: weird == normal
> Out[39]: False
>
> In [40]: weird[0] in normal
> Out[40]: False
>
> This seems very odd (and dangerous) to me.
>
> Is there a good reason? and is it too late to change it?
>
> -CHB
>
>
>
>
>
>
>
>
>
>>
>>
>> if _︴ⁿ퓪푚핖__ == "__main__":
>>
>> 풉eℓˡ허()
>>
>>
>>
>>
>>
>> # snippet from unittest/util.py
>>
>> _퓟Ⅼ햠홲험ℋ풪Lᴰ푬핽﹏핷피헡 = 12
>>
>> def _픰ʰ퓸ʳ핥홚푛(픰, p푟픢fi햝핝횎푛, sᵤ푓헳헂푥헹ₑ횗):
>>
>> ˢ헸i헽 = 퐥e혯(햘) - pr횎햋퐢x헅ᵉ퓷 - 풔홪ffi혅헹홚ₙ
>>
>> if ski혱 > _퐏헟햠혊홴H핺L핯홀혙﹏L픈풩:
>>
>> 혴 = '%s[%d chars]%s' % (홨[:혱퐫핖푓핚xℓ풆핟], ₛ횔풊p, 퓼[퓁풆햓
>> (횜) - 홨횞풇fix홡ᵉ혯:])
>>
>> return ₛ
>>
>>
>>
>>
>>
>> You should able to paste these into your local UTF-8-aware editor or IDE
>> and execute them as-is.
>>
>>
>>
>> (If this doesn’t come through, you can also see this as a GitHub gist at 
>> Hello,
>> World rendered in a variety of Unicode characters (github.com)
>> . I have
>> a second gist containing the transformer, but it is still a private gist
>> atm.)
>>
>>
>>
>>
>>
>> Some other discoveries:
>>
>> “·” (ASCII 183) is a valid identifier body character, making “_···” a
>> valid Python identifier. This could actually be another security attack
>> point, in which “s·join(‘x’)” could be easily misread as “s.join(‘x’)”, but
>> would actually be a call to potentially malicious method “s·join”.
>>
>> “_” seems to be a special case for normalization. Only the ASCII “_”
>> character is valid as a leading identifier character; the Unicode
>> characters that normalize to “_” (any of the characters in “︳︴﹍﹎﹏_”) can
>> only be used as identifier body characters. “︳” especially could be
>> misread as “|” followed by a space, when it actually normalizes to “_”.
>>
>>
>>
>>
>>
>> Potential beneficial uses:
>>
>> I am considering taking my transformer code and experimenting with an
>> orthogonal approach to syntax highlighting, using Unicode groups instead of
>> colors. Module names using characters from one group, builtins from
>> another, program variables from another, maybe distinguish local from
>> global variables. Colorizing has always been an obvious syntax highlight
>> feature, but is an accessibility issue for those with difficulty
>> distinguishing colors. Unlike the “ransom note” code above, code
>> highlighted in this way might even be quite pleasing to the eye.
>>
>>
>>
>>
>>
>> -- Paul McGuire
>>
>>
>>
>>
>> ___
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/GBLXJ2ZTIMLBD2MJQ4VDNUKFFTPPIIMO/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
> --
> Christopher Barker, PhD (Chris)
>
> Python Language Consulting
>   - Teaching
>   - Scientific Software Development
>   - Desktop GUI and Web Development
>   - wxPython, numpy, scipy, Cython
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/U3DJOQKMREWY35SHCRSD6V6FQA2T3SW7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-14 Thread MRAB

On 2021-11-14 17:17, Christopher Barker wrote:
On Sat, Nov 13, 2021 at 2:03 PM > wrote:


def 횑퓮햑풍표():

__

     try:

픥e헅핝횘︴ = "Hello"

함픬r퓵ᵈ﹎ = "World"

     ᵖ햗퐢혯퓽(f"{헵e퓵픩º_}, {햜ₒ풓lⅆ︴}!")

     except 퓣핪ᵖe햤헿ᵣ햔횛 as ⅇ헑c:

풑rℹₙₜ("failed: {}".핗헼ʳᵐªt(ᵉ퐱퓬))


Wow. Just Wow.

So why does Python apply  NFKC normalization to variable names?? I can't 
for the life of me figure out why that would be helpful at all.


The string methods, sure, but names?

And, in fact, the normalization is not used for string comparisons or 
hashes as far as I can tell.



[snip]

It's probably to deal with "é" vs "é", i.e. "\N{LATIN SMALL LETTER 
E}\N{COMBINING ACUTE ACCENT}" vs "\N{LATIN SMALL LETTER E WITH ACUTE}", 
which are different ways of writing the same thing.


Unfortunately, it goes too far, because it's unlikely that we want "ᵖ" 
("\N{MODIFIER LETTER SMALL P}') to be equivalent to "P" ("\N{LATIN 
CAPITAL LETTER P}".

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PNZICEQGVEAQH7KNBCBSS4LPAO25JBF3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Do we need to remove everything that's deprecated?

2021-11-14 Thread Christopher Barker
On Sun, Nov 14, 2021 at 8:19 AM Stephen J. Turnbull <
stephenjturnb...@gmail.com> wrote:

>  > But do
>  > we need to support running the same code on 3.5 to 3.10?
>
> Need?  No.  Want to not raise a big middle finger to our users?


Note that I said 3.5, not 3.6 -- 3.5 is no longer supported. If we feel the
need to be backward compatible with unsupported versions that we can't ever
remove anything.

I wouldn't mind if the tool gently suggests, ''' hey, folks, you can't
> really support both 3.5 AND 3.10 without a lot of "if hasattr(foo,
> 'frob')", so maybe you can drop support for 3.5? ''', though.
>

now I'm confused -- if you need the hasattr() calls, then you aren't
supporting it.I guess I meant:

runinng the same code without special case code to handle the differences.
Which is why I said "like 2to3" rather than "like six". I always hated six,
even though it was a necessary evil.

 > I’m confused — did you mean “sometimes cause dangerous behavior”? That’s
>  > pretty rare isn’t it?
>
> FVO of "often" == "yeah, I've heard enough stories that I worry about
> it", I mean "often".
>

hmm -- is there any way to know which deprecations might actually be
dangerous? -- for instance, it's hard to imagine a name change alone would
be that, but I have a failure of imagination.

Eric V. Smith wrote:
> I write systems that support old versions of python. We just moved to
3.7, for example.

But do you need to support non longer supported versions of Python -- 3.7
is still supported, for just these reasons.

Can we remove stuff that's only needed by unsupported versions of Python?

So you have an application running on 3.5. You really should upgrade Python
anyway. When you do, you will need to run an "update_deprecated_stuff"
script, and test and you're good. Is that too much a burden?

Frankly, even the 2to3 transition was less painful than I thought it would
be -- I had a substantial codebase written for py2 -- we couldn't go to
three for quite some time, as we have a LOT of dependencies, and it was a
while before they were all supported. When we finally made the transition
it was less painful than I thought it would be and it would have been even
less painful if we hadn't had a both-2-and-3 stage. And we've got a bunch
of Cython code that bridges strings between Python and C++, too.

For all of the testing and signoffs we do for a single release,
> I've calculated that it costs us $5k just to release code to prod, even
> if there are no actual code changes.


But that's a fixed cost -- any maintained codebase is going to need updates
and re-releases. I don't think anyone's suggesting that you do a release
only to remove deprecations.

For the example above -- if ALL you are doing is moving from running on
Python 3.5 to running on a newer version, wouldn't that $5k cost have to be
absorbed anyway?

-CHB



-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VIC7RSZ3RSXXOSQFWUMMFJ5HOMMTITC7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-14 Thread Jim J. Jewett
ptmcg@austin.rr.com wrote:

> ...  add a cautionary section on homoglyphs, specifically citing
> “A” (LATIN CAPITAL LETTER A) and “Α” (GREEK CAPITAL LETTER ALPHA)
> as an example problem pair.

There is a unicode tech report about confusables, but it is never clear where 
to stop.  Are I (upper case I), l (lower case l) and 1 (numeric 1) from ASCII 
already a problem?  And if we do it at all, is there any way to avoid making 
Cyrillic languages second-class?

I'm not quickly finding the contemporary report, but these should be helpful if 
you want to go deeper:

http://www.unicode.org/reports/tr36/
http://unicode.org/reports/tr36/confusables.txt
https://util.unicode.org/UnicodeJsps/confusables.jsp


> I wanted to look a little further at the use of characters in identifiers 
> beyond the standard 7-bit ASCII, and so I found some of these same 
> issues dealing with Unicode NFKC normalization. The first discovery was 
> the overlapping normalization of “ªº” with “ao”. 

Here I don't see the problem.  Things that look slightly different are really 
the same, and you can write it either way.  So you can use what looks like a 
funny font, but the closest it comes to a security risk is that maybe you could 
access something without a casual reader realizing that you are doing so.  They 
would know that you *could* access it, just not that you *did*.

> Some other discoveries:
> “·” (ASCII 183) is a valid identifier body character, making “_···” a valid
> Python identifier.

That and the apostrophe are Unicode consortium regrets, because they are 
normally punctuation, but there are also languages that use them as letters. 
 The apostrophe is (supposedly only) used by Afrikaans, I asked a native 
speaker about where/how often it was used, and the similarity to Dutch was 
enough that Guido felt comfortable excluding it.  (It *may* have been similar 
to using the apostrophe for a contraction in English, and saying it therefore 
represents a letter, but the scope was clearly smaller.)  But the dot is used 
in Catalan, and ... we didn't find anyone ready to say it wouldn't be needed 
for sensible identifiers.  It is worth listing as a warning, and linters should 
probably complain.

> “_” seems to be a special case for normalization. Only the ASCII “_”
> character is valid as a leading identifier character; the Unicode 
> characters that normalize to “_” (any of the characters in “︳︴﹍﹎﹏_”)
> can only be used as identifier body characters. “︳” especially could be
> misread as “|” followed by a space, when it actually normalizes to “_”.

So go ahead and warn, but it isn't clear how that could be abused to look like 
something other than a syntax error, except maybe through soft keywords.  (Ha!  
I snuck in a call to async︳def that had been imported with *, and you didn't 
worry about the import *, or the apparently wild cursor position marker, or the 
strange async definition that was never used!  No way I could have just issued 
a call to _flush and done the same thing!)

> Potential beneficial uses:
> I am considering taking my transformer code and experimenting with an
> orthogonal approach to syntax highlighting, using Unicode groups 
> instead of colors. Module names using characters from one group,
> builtins from another, program variables from another, maybe 
> distinguish local from global variables. Colorizing has always been an
> obvious syntax highlight feature, but is an accessibility issue for those
> with difficulty distinguishing colors.

I kind of like the idea, but ... if you're doing it on-the-fly in the editor, 
you could just use different fonts.  If you're actually saving those changes, 
it seems likely to lead to a lot of spurious diffs if anyone uses a different 
editor.

-jJ
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NPTL43EVT2FF76LXIBBWVHDU6NXH3HF5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Do we need to remove everything that's deprecated?

2021-11-14 Thread Eric V. Smith

On 11/14/2021 11:39 AM, Eric V. Smith wrote:


For things that really are removed (and I won't get in to the reasons 
for why something must be removed), I think a useful stance is "we 
won't remove anything that would make it hard to support a single code 
base across all supported python versions". We'd need to define 
"hard", maybe "no hasattr calls" would be part of it.


On second thought, I guess the existing policy already does this. Maybe 
we should make it more than 2 versions for deprecations? I've written 
libraries where I support 4 or 5 released versions. Although maybe I 
should just trim that back.


Eric

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XXPSVAC2CYJWBQFMZ6VO6FSEOOAEZ5MZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-14 Thread Christopher Barker
On Sat, Nov 13, 2021 at 2:03 PM  wrote:

> def 횑퓮햑풍표():
>
> try:
>
> 픥e헅핝횘︴ = "Hello"
>
> 함픬r퓵ᵈ﹎ = "World"
>
> ᵖ햗퐢혯퓽(f"{헵e퓵픩º_}, {햜ₒ풓lⅆ︴}!")
>
> except 퓣핪ᵖe햤헿ᵣ햔횛 as ⅇ헑c:
>
> 풑rℹₙₜ("failed: {}".핗헼ʳᵐªt(ᵉ퐱퓬))
>

Wow. Just Wow.

So why does Python apply  NFKC normalization to variable names?? I can't
for the life of me figure out why that would be helpful at all.

The string methods, sure, but names?

And, in fact, the normalization is not used for string comparisons or
hashes as far as I can tell.

In [36]: weird
Out[36]: 'ᵖ햗퐢혯퓽'

In [37]: normal
Out[37]: 'print'

In [38]: eval(weird + "('yup, that worked')")
yup, that worked

In [39]: weird == normal
Out[39]: False

In [40]: weird[0] in normal
Out[40]: False

This seems very odd (and dangerous) to me.

Is there a good reason? and is it too late to change it?

-CHB









>
>
> if _︴ⁿ퓪푚핖__ == "__main__":
>
> 풉eℓˡ허()
>
>
>
>
>
> # snippet from unittest/util.py
>
> _퓟Ⅼ햠홲험ℋ풪Lᴰ푬핽﹏핷피헡 = 12
>
> def _픰ʰ퓸ʳ핥홚푛(픰, p푟픢fi햝핝횎푛, sᵤ푓헳헂푥헹ₑ횗):
>
> ˢ헸i헽 = 퐥e혯(햘) - pr횎햋퐢x헅ᵉ퓷 - 풔홪ffi혅헹홚ₙ
>
> if ski혱 > _퐏헟햠혊홴H핺L핯홀혙﹏L픈풩:
>
> 혴 = '%s[%d chars]%s' % (홨[:혱퐫핖푓핚xℓ풆핟], ₛ횔풊p, 퓼[퓁풆햓(
> 횜) - 홨횞풇fix홡ᵉ혯:])
>
> return ₛ
>
>
>
>
>
> You should able to paste these into your local UTF-8-aware editor or IDE
> and execute them as-is.
>
>
>
> (If this doesn’t come through, you can also see this as a GitHub gist at 
> Hello,
> World rendered in a variety of Unicode characters (github.com)
> . I have
> a second gist containing the transformer, but it is still a private gist
> atm.)
>
>
>
>
>
> Some other discoveries:
>
> “·” (ASCII 183) is a valid identifier body character, making “_···” a
> valid Python identifier. This could actually be another security attack
> point, in which “s·join(‘x’)” could be easily misread as “s.join(‘x’)”, but
> would actually be a call to potentially malicious method “s·join”.
>
> “_” seems to be a special case for normalization. Only the ASCII “_”
> character is valid as a leading identifier character; the Unicode
> characters that normalize to “_” (any of the characters in “︳︴﹍﹎﹏_”) can
> only be used as identifier body characters. “︳” especially could be
> misread as “|” followed by a space, when it actually normalizes to “_”.
>
>
>
>
>
> Potential beneficial uses:
>
> I am considering taking my transformer code and experimenting with an
> orthogonal approach to syntax highlighting, using Unicode groups instead of
> colors. Module names using characters from one group, builtins from
> another, program variables from another, maybe distinguish local from
> global variables. Colorizing has always been an obvious syntax highlight
> feature, but is an accessibility issue for those with difficulty
> distinguishing colors. Unlike the “ransom note” code above, code
> highlighted in this way might even be quite pleasing to the eye.
>
>
>
>
>
> -- Paul McGuire
>
>
>
>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/GBLXJ2ZTIMLBD2MJQ4VDNUKFFTPPIIMO/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/7RPAZIGJYTOR76IVTAI5NA5NA2HEHDPE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Do we need to remove everything that's deprecated?

2021-11-14 Thread Eric V. Smith

On 11/12/2021 5:55 AM, Petr Viktorin wrote:


If deprecation now means "we've come up with a new way to do things, 
and you have two years to switch", can we have something else that 
means "there's now a better way to do things; the old way is a bit 
worse but continues to work as before"?


I think optparse is a good example of this (not that I love argparse).

For things that really are removed (and I won't get in to the reasons 
for why something must be removed), I think a useful stance is "we won't 
remove anything that would make it hard to support a single code base 
across all supported python versions". We'd need to define "hard", maybe 
"no hasattr calls" would be part of it.


Reliable tools to make the migration between versions would help, too.

I could live with this, although I write systems that support old 
versions of python. We just moved to 3.7, for example.


Eric

PS: Someone else said that my estimate of tens of thousands of dollars 
to deal with deprecations is too high. If anything, I think it's too 
low. For all of the testing and signoffs we do for a single release, 
I've calculated that it costs us $5k just to release code to prod, even 
if there are no actual code changes. Could that be improved? Sure. Will 
it? Unlikely. Maybe I'm an outlier, but I doubt it.



___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5WCUZD3W3RDZXL4VKMGGD6D6ATFWAEQA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Do we need to remove everything that's deprecated?

2021-11-14 Thread Stephen J. Turnbull
Christopher Barker writes:
 > On Sat, Nov 13, 2021 at 12:01 AM Stephen J. Turnbull
 > 
 > > What I think would make a difference is a six-like tool for making
 > > "easy changes" like substituting aliases and maybe marking other stuff
 > > that requires human brains to make the right changes.
 > 
 > 
 > I think a “2to3” like or “futurize” like tool is a better idea, but
 > yes.

That's what I meant, thanks for the correction.

 > The real challenge with the 2-3 transition was that many of us
 > needed to keep the same code base running on both 2 and 3. But do
 > we need to support running the same code on 3.5 to 3.10?

Need?  No.  Want to not raise a big middle finger to our users?  Yes.
Speaking for GNU Mailman (and I think I can do that in this case
without getting lynched by the rest of our crew).

I wouldn't mind if the tool gently suggests, ''' hey, folks, you can't
really support both 3.5 AND 3.10 without a lot of "if hasattr(foo,
'frob')", so maybe you can drop support for 3.5? ''', though.

 > I’m confused — did you mean “sometimes cause dangerous behavior”? That’s
 > pretty rare isn’t it?

FVO of "often" == "yeah, I've heard enough stories that I worry about
it", I mean "often".

We're talking about risk assessments, and I work with Internet-facing
code, where there are no risks, just certain disasters at an uncertain
but near-future date. ;-)

Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XKDXS3AGTBWRK2MFTM6OOP7WE6UM4PMF/
Code of Conduct: http://python.org/psf/codeofconduct/