[Python-Dev] Re: What to do about invalid escape sequences

2019-08-15 Thread Rob Cliffe via Python-Dev



On 15/08/2019 12:17:36, Petr Viktorin wrote:

On 8/15/19 10:40 AM, Greg Ewing wrote:

If we want a truly raw string format that allows all characters,
including any kind of quote, we could take a tip from Fortran:

 s = 31HThis is a "totally raw" string!


Or from Rust:

let s = r"Here's a raw string";
let s = r#"Here's a raw string with "quotes" in it"#;
let s = r##"Here's r#"the raw string syntax"# in raw string"##;
let s = r###"and here's a '##"' as well"###;
___
I rather like the idea!  (Even though it would add to the proliferation 
of string types.)
Obviously Python can't use # as the special character since that 
introduces a comment,
and a lot of other possibilities are excluded because they would lead to 
ambiguous syntax.
Say for the sake of argument we used "!" (exclamation mark). Possible 
variations include:

(1) Like Rust:
    s = r"Here's a raw string";
    s = r!"Here's a raw string with "quotes" in it"!;
    s = r!!"Here's r!"the raw string syntax"! in raw string"!!;
    s = r!!!"and here's a '!!"' as well"!!!;
(2) Same, but omit the leading 'r' when using !:
    s = r"Here's a raw string";
    s = !"Here's a raw string with "quotes" in it"!;
    s = !!"Here's a raw string with "quotes" and !exclamation marks! in 
it"!!;

    s = !!!"and here's a '!!"' as well"!!!;
    # Cons: Would conflict with adding ! as an operator (or at minimum, 
as a unary operator) for some other purpose in future.

    #    Makes it less obvious that a !string! is a raw string.
(3) Allow the user to specify his own delimiting character:
    s = r!|This raw string can't contain a "bar".|
(4) As above, but the "!" is not required:
    s = r|This raw string can't contain a "bar".|
    # In this case the delimiter ought not to be a letter
    # (it might conflict with current or future string prefixes);
    # this could be forbidden.
(5) Similar, but allow the user to specify his own delimiting *string* 
(specified between "!"s) (as long as it doesn't contain !):

    let s = r!?@!Could this string could contain almost anything? Yes!?@
    # The text in this string is:
    #    Could this string could contain almost 
anything?  Yes!
(6) Same except the first "!" is not required.  In this case the first 
character of the delimiting string should not be a letter:

    let s = r?@!Could this string could contain almost anything? Yes!?@
    # The text in this string is:
    #       Could this string could contain almost 
anything?  Yes!


I can dream ...

A point about the current syntax: It is not true that a raw string can't 
end in a backslash,
as https://en.wikipedia.org/wiki/String_literal points out.  It can't 
end in an *odd number*
of backslashes.  42 is fine, 43 is no good.  Which makes it seem even 
more of a language wart

(think of program-generated art).

Rob Cliffe
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/AZVQBRODB64WAP22J4VSVOBAIEKLUMB5/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-15 Thread Glenn Linderman

On 8/15/2019 4:17 AM, Petr Viktorin wrote:

On 8/15/19 10:40 AM, Greg Ewing wrote:

If we want a truly raw string format that allows all characters,
including any kind of quote, we could take a tip from Fortran:

 s = 31HThis is a "totally raw" string!


Or from Rust:

let s = r"Here's a raw string";
let s = r#"Here's a raw string with "quotes" in it"#;
let s = r##"Here's r#"the raw string syntax"# in raw string"##;
let s = r###"and here's a '##"' as well"###;


Indeed, Fortran has raw strings, but comes with the disadvantage of 
having to count characters. This is poor form when edits want to change 
the length of the string, although it might be advantageous if the 
string must fit into a certain fixed-width on a line printer. Let's not 
go there.


Without reading the Rust spec, but from your examples, it seems that 
Rust has borrowed concepts from Perl's q and qq operators, both of which 
allowed specification of any non-alphanumeric character as the 
delimiter. Not sure if that included Unicode characters (certainly not 
in the early days before Unicode support was added), but it did have a 
special case for paired characters such as <> [] {} to allow those pairs 
to be used as delimiters, and still allow properly nested instances of 
themselves inside the string.


It looks like Rust might only allow #, but any number of them, to 
delimit raw strings. This is sufficient, but for overly complex raw 
strings containing lots of # character sequences, it could get 
cumbersome, and starts to border on the problems of the Fortran 
solution, where character counting is an issue, whereas the choice of an 
alternative character or character sequence would result in a simpler 
syntax.


I don't know if Rust permits implicit string concatenation, but a quick 
search convinces me it doesn't.


The combination of Python's triple-quote string literal, together with 
implicit concatenation, is a powerful way to deal with extremely complex 
string literals, although it does require breaking them into pieces 
occasionally, mostly when including a string describing the triple-quote 
syntax. Note that regex searching for triple-quotes can use "{3} or '{3} 
to avoid the need to embed triple-quotes in the regex.


Perl's "choice of delimiter" syntax is maybe a bit more convenient 
sometimes, but makes parsing of long strings mentally exhausting 
(although it is quick for the interpreter), due to needing to remember 
what character is being used as the delimiter.


My proposal isn't intended to change the overall flavor of Python's 
string syntax, just to regularize and simplify it, while allowing 
additional escapes and other extensions to be added in the future, 
without backward-compatibility issues.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PUY3DUQKAPQEQLPRZCZX2NAD4Z2KPIJW/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-15 Thread Petr Viktorin

On 8/15/19 10:40 AM, Greg Ewing wrote:

If we want a truly raw string format that allows all characters,
including any kind of quote, we could take a tip from Fortran:

     s = 31HThis is a "totally raw" string!


Or from Rust:

let s = r"Here's a raw string";
let s = r#"Here's a raw string with "quotes" in it"#;
let s = r##"Here's r#"the raw string syntax"# in raw string"##;
let s = r###"and here's a '##"' as well"###;
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/AB565Z6Z6VVUQR74VWLRCA6R2J6NZGAP/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-15 Thread Greg Ewing

If we want a truly raw string format that allows all characters,
including any kind of quote, we could take a tip from Fortran:

s = 31HThis is a "totally raw" string!

--
Greg
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/DO6CH72RHRZLE2M6ILAEYFLZ3FD6D5KN/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-14 Thread Glenn Linderman

On 8/14/2019 8:02 AM, Random832 wrote:

On Mon, Aug 12, 2019, at 15:15, Terry Reedy wrote:

Please no more combinations. The presence of both legal and illegal
combinations is already a mild nightmare for processing and testing.
idlelib.colorizer has the following re to detest legal combinations

  stringprefix = r"(?i:r|u|f|fr|rf|b|br|rb)?"

More advanced syntax highlighting editors have to handle each string type separately 
anyway, because they highlight (valid) backslash-escapes and f-string formatters. 
The proposed 'v-string' type would need separate handling even in a simplistic 
editor like IDLE, because it's different at the basic level of \" not ending 
the string (whereas, for better or worse, all current string types have exactly the 
same rules for how to find the end delimiter)
I had to read this several times, and then only after reading Eric's 
reply, it finally hit me that what you are saying is that \" doesn't end 
the string in any other form of string, but that sequence would end a 
v-string.


It seems that also explains why Serhiy, in describing his experiment 
really raw string literals mentioned having to change the tokenizer as 
well as the parser (proving that it isn't impossible to deal with truly 
raw strings).


\" not ending a raw string was certainly a gotcha for me when I started 
using Python (with a background in C and Perl among other languages), 
and it convinced me not to raw strings, that that gotcha was not worth 
the other benefits of raw strings. Serhiy said:
Currently a raw literal cannot end in a single backslash (e.g. in 
r"C:\User\"). Although there are reasons for this. It is an old 
gotcha, and there are many closed issues about it. This question is 
even included in FAQ. 
which indicates that I am not the only one that has been tripped up by 
that over the years.


Trying to look at it from the eyes of a beginning programmer, the whole 
idea of backslash being an escape character is an unnatural artifice. 
I'm unaware (but willing to be educated) of any natural language, when 
using quotations, that has such a  concept. Nested quotations exist, in 
various forms:  use of a different quotation mark for the inner and 
outer quotations, and block quotations (which in English, have increased 
margin on both sides, and have a blank line before and after).


Python actually supports constructs very similar to the natural language 
formats, allowing both  " and ' for quotations and nested quotations, 
and the triple-quoted string with either " or ' is very similar in 
concept to a block quotation. But _all_ the strings forms are burdened 
with surprises for the beginning programmer: escape sequences of one 
sort or another must be learned and understood to avoid surprises when 
using the \ character.


Programming languages certainly need an escape character mechanism to 
deal with characters that cannot easily be typed on a keyboard (such as 
¤ ¶ etc.), or which are visually indistinguishable from other characters 
or character sequences (various widths of white space), or which would 
be disruptive to the flow of code or syntax if represented by the usual 
character (newline, carriage return, formfeed, maybe others). But these 
are programming concepts, not natural language concept.  The basic 
concept of a quoted string should best be borrowed directly from natural 
language, and then enhancements to that made to deal with programming 
concepts.


In Python, as in C, the escape characters are built in the basic string 
syntax, one must learn the quirks of the escaping mechanism in order to 
write


In Perl, " strings include escapes, and ' strings do not. So there is a 
basic string syntax that is similar to natural language, and one that is 
extended to include programming concepts. [N.B. There are lots of 
reasons I switched from Perl to Python, and don't have any desire to go 
back, but I have to admit, that the lack of a truly raw string in Python 
was a disappointment.]


So that, together with the desire for new escape sequences, and the 
creation of a new escape mechanism in the f-string {} (which adds both { 
and } as escape characters by requiring them to be doubled to be treated 
as literal inside an f-string, instead of using \{ and \} as the escapes 
[which would have been possible, due to the addition of the f prefix]), 
and the issue that because every current \-escape is defined to do 
something, is why I suggested elsewhere in this thread 
 
that perhaps the whole irregular string syntax should be rebooted with a 
future import, and it seems it could both be simpler, more regular, and 
more powerful as a result. And by using a future import, there are no 
backward incompatibility issues, and migration can be module by module.


The more I think about this, the more tempting it is to attempt to fork 
Python just to have a better string 

[Python-Dev] Re: What to do about invalid escape sequences

2019-08-14 Thread Eric V. Smith



On 8/14/2019 11:02 AM, Random832 wrote:

On Mon, Aug 12, 2019, at 15:15, Terry Reedy wrote:

Please no more combinations. The presence of both legal and illegal
combinations is already a mild nightmare for processing and testing.
idlelib.colorizer has the following re to detest legal combinations

  stringprefix = r"(?i:r|u|f|fr|rf|b|br|rb)?"


More advanced syntax highlighting editors have to handle each string type separately 
anyway, because they highlight (valid) backslash-escapes and f-string formatters. 
The proposed 'v-string' type would need separate handling even in a simplistic 
editor like IDLE, because it's different at the basic level of \" not ending 
the string (whereas, for better or worse, all current string types have exactly the 
same rules for how to find the end delimiter)


The reason I defined f-strings as I did is so that lexer/parsers 
(editors, syntax highlighters, other implementations, etc.) could easily 
ignore them, at least as a first pass. They're literally like all other 
strings to the lexer. Python's lexer/parser says that a string is:


- some optional letters, making the string prefix
- an opening quote or triple quote
- some optional chars, with \ escaping
- a matching closing quote or triple quote

The parser then validates the string prefix ('f' is okay, 'b' is okay, 
'fb' isn't okay, 'x' isn't okay, etc.) It then operates on the contents 
of the string, based on what the string prefix tell it to do.


So all an alternate lexer/parser has to do is add 'f' to the valid 
string prefixes, and it could then at least skip over f-strings. 
Somewhere in my notes I have 3 or 4 examples of projects that did this, 
and voila: they "supported" f-strings. Imagine a syntax highlighter that 
didn't want to highlight the inside of an f-string.


The proposed v-strings would indeed break this. I'm opposed to them for 
this reason, among others.


That all said, I am considering moving f-string parsing into the CPython 
parser. That would let you say things like:


f'some text {ord('a')}'

I'm not sure that's a great idea, but I've discussed it with several 
alternate implementations, and with authors of several editors, and they 
seem okay with it. I'm following Guido's parser experiment with some 
interest, to see how it might interact with this proposal. Might they 
also be okay with v-strings? Maybe. But it seems like a lot of hassle 
for a very minor feature.


Eric
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BDZCXGRW5KTUOGMRT6OHH6S3UD4BV5ZV/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-14 Thread Random832
On Mon, Aug 12, 2019, at 15:15, Terry Reedy wrote:
> Please no more combinations. The presence of both legal and illegal 
> combinations is already a mild nightmare for processing and testing. 
> idlelib.colorizer has the following re to detest legal combinations
> 
>  stringprefix = r"(?i:r|u|f|fr|rf|b|br|rb)?"

More advanced syntax highlighting editors have to handle each string type 
separately anyway, because they highlight (valid) backslash-escapes and 
f-string formatters. The proposed 'v-string' type would need separate handling 
even in a simplistic editor like IDLE, because it's different at the basic 
level of \" not ending the string (whereas, for better or worse, all current 
string types have exactly the same rules for how to find the end delimiter)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/H77NNYCZI37JCHGSMIHMTKNQVK5SGCWY/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-13 Thread Serhiy Storchaka

12.08.19 22:51, Glenn Linderman пише:

On 8/12/2019 12:11 AM, Serhiy Storchaka wrote:
For example, in many cases `\"` can be replaced with 
`"'"'r"`, but it does not look pretty readable.


No, that is not readable.  But neither does it seem to be valid syntax, 
or else I'm not sure what you are saying. Ah, maybe you were saying that 
a seqence like the '\"' that is already embedded in a raw string can be 
converted to the sequence `"'"'r"` also embedded in the raw string. That 
makes the syntax work, but if that is what you were saying, your 
translation dropped the \ from before the ", since the raw string 
preserves both the \ and the ".


Yes, this is what I meant. Thank you for correction. I dropped the `\` 
because in context of regular expression `\"` and `"` is the same, and a 
backslash is only used to prevent `"` to end a string literal. This is 
why `\"` is so rarely used in other strings: because only in regular 
expressions `\` before `"` does not matter.


Regarding the readability, I think any use of implicitly concatenated 
strings should have at least two spaces or a newline between them to 
make the implicit concatenation clearer.


Agree. I have wrote it without spaces for dramatic effect.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/IFRYF5GUDNUTF7EJPYZO2QY3VHCM7FPK/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-12 Thread Neil Schemenauer
On 2019-08-10, Serhiy Storchaka wrote:
> Actually we need to distinguish the the author and the user of the code and
> show warnings only to the author. Using .pyc files was just an heuristic:
> the author compiles the Python code, and the user uses compiled .pyc files.
> Would be nice to have more reliable way to determine the owning of the code.
> It is related not only to SyntaxWarnings, but to runtime
> DeprecationWarnings. Maybe silence warnings only for readonly files and make
> files installed by PIP readonly?

Identifying the author vs the user seems like a good idea.  Relying
on the OS filesystem seems like a solution that would cause some
challenges.  Can we embed that information in the .pyc file instead?
That way, Python knows that it is module/package that has been
installed with pip or similar and the end user is likely not the
author.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OVUKO7BJHG3JBKKGOWYWK4HTJ4SICCSK/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-12 Thread Glenn Linderman

On 8/12/2019 12:11 AM, Serhiy Storchaka wrote:

11.08.19 23:07, Glenn Linderman пише:

On 8/11/2019 1:26 AM, Serhiy Storchaka wrote:

10.08.19 22:10, Glenn Linderman пише:
I wonder how many raw strings actually use the \"  escape 
productively? Maybe that should be deprecated too! ?  I can't think 
of a good and necessary use for it, can anyone?


This is an interesting question. I have performed some experiments. 
15 files in the stdlib (not counting the tokenizer) use \' or \" in 
raw strings. And one test (test_venv) is failed because of using 
them in third-party code. All cases are in regular expressions. It 
is possible to rewrite them, but it is less trivial task than fixing 
invalid escape sequences. So changing this will require much much 
more long deprecation period.


Couldn't they be rewritten using the above idiom? Why would that be 
less trivial?
Or by using triple quotes, so the \" could be written as " ? That 
seems trivial.


Yes, they could. You can use different quote character, triple quotes, 
string literal concatenation. There are many options, and you should 
choose what is applicable in any particular case and what is optimal. 
You need to analyze the whole string literal and code transformation 
usually is more complex than just duplicating a backslash or adding 
the `r` prefix. For example, in many cases `\"` can be replaced with 
`"'"'r"`, but it does not look pretty readable.


No, that is not readable.  But neither does it seem to be valid syntax, 
or else I'm not sure what you are saying. Ah, maybe you were saying that 
a seqence like the '\"' that is already embedded in a raw string can be 
converted to the sequence `"'"'r"` also embedded in the raw string. That 
makes the syntax work, but if that is what you were saying, your 
translation dropped the \ from before the ", since the raw string 
preserves both the \ and the ".


Regarding the readability, I think any use of implicitly concatenated 
strings should have at least two spaces or a newline between them to 
make the implicit concatenation clearer.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VGFQFBKNPNKBJLOPDHQWGCJ6WPK7IDKT/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-12 Thread Terry Reedy

On 8/12/2019 6:34 AM, Eric V. Smith wrote:

On 8/12/2019 2:52 AM, Greg Ewing wrote:

Eric V. Smith wrote:
I'm not in any way serious about this. I just want people to realize 
how many wacky combinations there would be.


It doesn't matter how many combinations there are, as long as
multiple prefixes combine in the way you would expect, which
they do as far as I can see.


In general I agree, although there's some cognitive overhead to which 
combinations are valid or not. There's no "fu" strings, for example.


But for reading code that doesn't matter, so your point stands.


Please no more combinations. The presence of both legal and illegal 
combinations is already a mild nightmare for processing and testing. 
idlelib.colorizer has the following re to detest legal combinations


stringprefix = r"(?i:r|u|f|fr|rf|b|br|rb)?"

and the following test strings to make sure it works

"# All valid prefixes for unicode and byte strings should be colored.\n"
"'x', '''x''', \"x\", \"\"\"x\"\"\"\n"
"r'x', u'x', R'x', U'x', f'x', F'x'\n"
"fr'x', Fr'x', fR'x', FR'x', rf'x', rF'x', Rf'x', RF'x'\n"
"b'x',B'x', br'x',Br'x',bR'x',BR'x', rb'x', rB'x',Rb'x',RB'x'\n"
"# Invalid combinations of legal characters should be half colored.\n"
"ur'x', ru'x', uf'x', fu'x', UR'x', ufr'x', rfu'x', xf'x', fx'x'\n"

Or, if another prefix is added, please add an expanded 
guaranteed-correct regex to the stdlib somewhere.


--
Terry Jan Reedy
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/EVDDJEA25YKPTKX6RZY55Q66NJWTOH3A/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-12 Thread Terry Reedy

On 8/7/2019 6:57 PM, raymond.hettin...@gmail.com wrote:

For me, these warnings are continuing to arise almost daily.  See two recent 
examples below.


Both examples are fragile, as explained below.  They make me more in 
favor of no longer guessing what \ means in the default mode.


The transition is a different matter.  I wonder if future imports could 
be (or have been) used.



In both cases, the code previously had always worked without complaint.


Because they are the are in the subset of examples of the type that work 
without adding an r prefix.  Others in the class require an r prefix.


Ascii art:


''' How old-style formatting works with positional placeholders
print('The answer is %d today, but was %d yesterday' % (new, old))
  \o
   \o
'''


In general, ascii art needs an r prefix.  Even if one example gets away 
without, an edited version or a new example may not.  In the example 
above, the o looks weird.  Suppose '\' were used instead.  Suppose one 
pointed to parentheses instead and ended up with this teaching example.


'''Sample code with parentheses:
print('The answer is %d today, but was %d yesterday' % (new, old))
\---\
  \--\
These parentheses are properly nested.
'''
Whoops. This is what I mean by fragile.

A new example:

alpha_slide = '''
-
\abcd
*\bcd
**\cd
***\d
\
-
'''
print(alpha_slide)
# This looks nice in source, but the result is
-
bcd
*cd
**\cd
***\d
-
where the appearance of \a and \b depends on the output device.

Ascii art never needs cooking.  I would teach "Always prefix ascii art 
with r" in preference to "Don't bother prefixing ascii art with r unless 
you really have to because you use one of a memorized the list of 
escapes, and promise yourself to recheck and add it if needed everytime 
you edit and are able to keep that promise".


vCard data item:


# Cut and pasted from:
# https://en.wikipedia.org/wiki/VCard#vCard_2.1
vcard = '''
BEGIN:VCARD
VERSION:2.1
N:Gump;Forrest;;Mr.
FN:Forrest Gump
ORG:Bubba Gump Shrimp Co.
TITLE:Shrimp Man
PHOTO;GIF:http://www.example.com/dir_photos/my_photo.gif
TEL;WORK;VOICE:(111) 555-1212
TEL;HOME;VOICE:(404) 555-1212
ADR;WORK;PREF:;;100 Waters Edge;Baytown;LA;30314;United States of America
LABEL;WORK;PREF;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:100 Waters Edge=0D=
  =0ABaytown\, LA 30314=0D=0AUnited States of America
ADR;HOME:;;42 Plantation St.;Baytown;LA;30314;United States of America
LABEL;HOME;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:42 Plantation St.=0D=0A=
  Baytown, LA 30314=0D=0AUnited States of America
EMAIL:forrestg...@example.com
REV:20080424T195243Z
END:VCARD
'''


Thank you for including the link so I could learn more.  In general, 
vCard representations should be raw.  The above uses the vCard 2.1 spec. 
 The more commonly used 3.0 and 4.0 specs replace "=0D=0A=" in the 2.1 
spec with a raw "\n".  If the above were updated, it might appear to 
'work', but would, I believe, fail if fed to a vCard processor.  This is 
what I mean by 'fragile'.


I would rather teach beginners the easily remembered "Always prefix 
vCard representations with 'r'" rather than "Only prefix vCard 
representations with 'r' if you use the more common newer specs and use 
'\n', as you often would."  (I don't know if raw '\t' is ever used; if 
so, add that.)


The above is based on the idea that while bytes and strings are 
'sequences of characters (codes)', they are usually used to represent 
instances of often undeclared types of data.  If the strings of a data 
type never need cooking, and may contain backslashes that could be 
cooked but must not be, the easiest rule is to always prefix with 'r'.
(Those with experience can refine it if they wish.)  If instances 
contain some backslashes that must be cooked, omit 'r' and double any 
backslashes that must be left alone.


--
Terry Jan Reedy
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/TOV634XSDSM57ZZYGDOMBFNUT6VVI3P7/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-12 Thread Steve Holden
On Mon, Aug 12, 2019 at 6:26 PM Terry Reedy  wrote:

> On 8/8/2019 5:31 AM, Dima Tisnek wrote:
> [...]
>
> To me, this one of the major problems with the half-baked default.
> People who want string literals left as is sometimes get away with
> omitting explicit mention of that fact, but sometimes don't.
>
> Note: when we added '\u' and '\U' escapes, we broke working code that
> had Windows paths like "C:\Users\Terry".  But we did it anyway.
>

It might be helpful it there were some sort of declaration that the
ultimate goal, despite the backwards incompatibility it would entail, is
removing this wart from the language.

While practicality does indeed often beat purity, I fell this particular
case may be the exception that proves the rule. Onwards to 4.0!
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/6AMS2N4O53RZ4BKTAB3GNPADZBGA4T7B/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-12 Thread Terry Reedy

On 8/8/2019 5:31 AM, Dima Tisnek wrote:

These two ought to be converted to raw strings, shouldn't they?


For the first example, yes or no. It depends ;-)  See below.

The problem is that string literals in python code are, by default, 
half-baked.  The interpretation of '\' by the python parser, and the 
resulting string object, depends on the next char.  I can see how this 
is sometimes a convenience, but I consider it a design bug.  There is no 
way for a user to say "I intend for this string to be fully baked, so if 
it cannot be, I goofed."  And the convenience gets used when it must not be.



On Thu, 8 Aug 2019 at 08:04,  wrote:


For me, these warnings are continuing to arise almost daily.  See two recent 
examples below.  In both cases, the code previously had always worked without 
complaint.

- Example from yesterday's class 

''' How old-style formatting works with positional placeholders

print('The answer is %d today, but was %d yesterday' % (new, old))
  \o
   \o
'''

SyntaxWarning: invalid escape sequence \-


For true ascii-only character art, where one will never want '\' baked, 
an 'r' prefix is appropriate.  It is in fact mandatory when '\' may be 
followed by a legal escape code.



If one is making unicode art, with '\u' and '\U' escapes used, one must 
not use the 'r' prefix, but should instead use '\\' for unbaked 
backslashes.  The unicode escapes have already thrown off column alignments.



- Example from today's class 

# Cut and pasted from:
# https://en.wikipedia.org/wiki/VCard#vCard_2.1
vcard = '''
BEGIN:VCARD
VERSION:2.1
N:Gump;Forrest;;Mr.
FN:Forrest Gump
ORG:Bubba Gump Shrimp Co.
TITLE:Shrimp Man
PHOTO;GIF:http://www.example.com/dir_photos/my_photo.gif
TEL;WORK;VOICE:(111) 555-1212
TEL;HOME;VOICE:(404) 555-1212
ADR;WORK;PREF:;;100 Waters Edge;Baytown;LA;30314;United States of America
LABEL;WORK;PREF;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:100 Waters Edge=0D=
  =0ABaytown\, LA 30314=0D=0AUnited States of America
ADR;HOME:;;42 Plantation St.;Baytown;LA;30314;United States of America
LABEL;HOME;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:42 Plantation St.=0D=0A=
  Baytown, LA 30314=0D=0AUnited States of America
EMAIL:forrestg...@example.com
REV:20080424T195243Z
END:VCARD
'''

SyntaxWarning: invalid escape sequence \,


Based on my reading of the Wikipedia vCard page linked above,
the vCard protocol mandates use of '\' chars that must be passed through 
unbaked to a vCard processor.  (I don't know why '\,', but it does not 
matter.)  So vCard strings using '\' should generally have 'r' prefixes, 
just as for regex and latex strings.  For version 2.1, it appears that 
one can currently, in 3.7-, get away with omitting 'r'.  In versions 3.0 
and 4.0, embedded 'newline' is represented by '\n' instead of '=0D=0A'. 
It must not be baked by python, but passed on as is.  So omitting 'r' 
becomes a bug for those versions.


To me, this one of the major problems with the half-baked default. 
People who want string literals left as is sometimes get away with 
omitting explicit mention of that fact, but sometimes don't.


Note: when we added '\u' and '\U' escapes, we broke working code that 
had Windows paths like "C:\Users\Terry".  But we did it anyway.


--
Terry Jan Reedy
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NZZ32WFHUMQAKG6O3KDYV5J5NQMWGKSO/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-12 Thread Steve Dower

On 10Aug2019 1544, Glenn Linderman wrote:

On 8/10/2019 3:36 PM, Greg Ewing wrote:

It might be better to introduce a new string prefix, e.g.
'v' for 'verbatim':

   v"C:\Users\Fred\"

Which is why I suggested  rr"C:\directory\", but allowed as how there 
might be better spellings I like your  v for verbatim !


The only new prefix I would support is 'p' to construct a pathlib.Path 
object directly from the string literal. But that doesn't change any of 
the existing discussion (apart from please take all the new prefix 
suggestions to python-ideas).


People have been solving the trailing backslash problem for a long time, 
and it's not a big enough burden to need a new fix.


Unintentional escapes in paths are a much bigger burden for new users 
and deserve a fix, but our current warning about the upcoming change is 
not targeted at the right people. Because we intend to fix the warning, 
delaying it by a release is not just "kicking the can down the road". 
But we need some agreement on what that looks like.


The bug is already at https://bugs.python.org/issue32912

Cheers,
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YVL3J7A4AM43NSUPUHMIMVZ7NT3WC2AZ/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-12 Thread Eric V. Smith

On 8/12/2019 2:52 AM, Greg Ewing wrote:

Eric V. Smith wrote:
I'm not in any way serious about this. I just want people to realize 
how many wacky combinations there would be.


It doesn't matter how many combinations there are, as long as
multiple prefixes combine in the way you would expect, which
they do as far as I can see.


In general I agree, although there's some cognitive overhead to which 
combinations are valid or not. There's no "fu" strings, for example.


But for reading code that doesn't matter, so your point stands.

Eric
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VD4BZYX2UV2GBU22PSZKDQAANQ43EZ54/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-12 Thread Serhiy Storchaka

11.08.19 23:07, Glenn Linderman пише:

On 8/11/2019 1:26 AM, Serhiy Storchaka wrote:

10.08.19 22:10, Glenn Linderman пише:
I wonder how many raw strings actually use the \"  escape 
productively? Maybe that should be deprecated too! ?  I can't think 
of a good and necessary use for it, can anyone?


This is an interesting question. I have performed some experiments. 15 
files in the stdlib (not counting the tokenizer) use \' or \" in raw 
strings. And one test (test_venv) is failed because of using them in 
third-party code. All cases are in regular expressions. It is possible 
to rewrite them, but it is less trivial task than fixing invalid 
escape sequences. So changing this will require much much more long 
deprecation period.


Couldn't they be rewritten using the above idiom? Why would that be less 
trivial?
Or by using triple quotes, so the \" could be written as " ? That seems 
trivial.


Yes, they could. You can use different quote character, triple quotes, 
string literal concatenation. There are many options, and you should 
choose what is applicable in any particular case and what is optimal. 
You need to analyze the whole string literal and code transformation 
usually is more complex than just duplicating a backslash or adding the 
`r` prefix. For example, in many cases `\"` can be replaced with 
`"'"'r"`, but it does not look pretty readable.


See https://github.com/python/cpython/pull/15217.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/KSOBUCTZITXAI3KG77DVST7U4DBPPKGR/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-12 Thread Greg Ewing

Eric V. Smith wrote:
I'm not in any way serious about this. I just want people to realize how 
many wacky combinations there would be.


It doesn't matter how many combinations there are, as long as
multiple prefixes combine in the way you would expect, which
they do as far as I can see.

--
Greg
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CMMOYWF7DOX4K5CS2IONDXE4DEJGAUT4/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-12 Thread Glenn Linderman

On 8/11/2019 8:40 PM, Eric V. Smith wrote:

On 8/11/2019 4:18 PM, Glenn Linderman wrote:

On 8/11/2019 2:50 AM, Steven D'Aprano wrote:

On Sat, Aug 10, 2019 at 12:10:55PM -0700, Glenn Linderman wrote:


Or invent "really raw" in some spelling, such as rr"c:\directory\"
or e for exact, or x for exact, or "c:\directory\"

And that brings me to the thought that if   \e  wants to become an
escape for escape, that maybe there should be an "extended escape"
prefix... if you want to use more escapes, define ee"string where \\
can only be used as an escape or escaped character, \e means the ASCII
escape character, and \ followed by a character with no escape
definition would be an error."

Please no.

We already have b-strings, r-strings, u-strings, f-strings, br-strings,
rb-strings, fr-strings, rf-strings, each of which comes in four
varieties (single quote, double quote, triple single quote and triple
double quote). Now you're talking about adding rr-strings, v-strings
(Greg suggested that) and ee-strings, presumably some or all of which
will need b*- and *b- or f*- and *f- varieties too.


Don't forget the upper & lower case varieties :)


And all orders!

>>> _all_string_prefixes()
{'', 'b', 'BR', 'bR', 'B', 'rb', 'F', 'RF', 'rB', 'FR', 'Rf', 'Fr', 
'RB', 'f', 'r', 'rf', 'rF', 'R', 'u', 'fR', 'U', 'Br', 'Rb', 'fr', 'br'}

>>> len(_all_string_prefixes())
25

And if you add just 'bv' and 'fv', it's 41:

{'', 'fr', 'Bv', 'BR', 'F', 'rb', 'Fv', 'VB', 'vb', 'vF', 'br', 'FV', 
'vf', 'FR', 'fV', 'bV', 'Br', 'Vb', 'Rb', 'RF', 'bR', 'r', 'R', 'Vf', 
'fv', 'U', 'RB', 'B', 'rB', 'vB', 'Fr', 'rF', 'fR', 'Rf', 'BV', 'VF', 
'bv', 'b', 'u', 'f', 'rf'}


There would be no need for 'uv' (not needed for backward 
compatibility) or 'rv' (can't be both raw and verbatim).


I'm not in any way serious about this. I just want people to realize 
how many wacky combinations there would be. And heaven forbid we ever 
add some combination of 3 characters. If 'rfv' were actually also 
valid, you get to 89:


{'', 'br', 'vb', 'fR', 'F', 'rFV', 'fRv', 'fV', 'rVF', 'Rfv', 'u', 
'vRf', 'fVR', 'rfV', 'Fvr', 'vrf', 'fVr', 'vB', 'Vb', 'Rvf', 'Fv', 
'Fr', 'FVr', 'B', 'rVf', 'FVR', 'vfr', 'VB', 'VrF', 'BR', 'VRf', 
'vfR', 'FR', 'Br', 'RFV', 'Rf', 'fvR', 'f', 'rb', 'VfR', 'VFR', 'fr', 
'vFR', 'VRF', 'frV', 'bR', 'b', 'FrV', 'r', 'R', 'RVF', 'FV', 'rvF', 
'FRV', 'Vrf', 'rvf', 'FRv', 'Frv', 'vF', 'bV', 'VF', 'fv', 'RF', 'RB', 
'rB', 'vRF', 'RFv', 'RVf', 'Rb', 'Vfr', 'vrF', 'rf', 'Bv', 'vf', 'rF', 
'U', 'bv', 'FvR', 'RfV', 'Vf', 'VFr', 'vFr', 'fvr', 'BV', 'rFv', 
'rfv', 'fRV', 'frv', 'RvF'}


If only we could deprecate upper case prefixes!

Eric


Yes. Happily while there is a combinatorial explosion in spellings and 
casings, there is no cognitive overload: each character has an 
independent effect on the interpretation and use of the string, so once 
you understand the 5 existing types (b r u f and plain) you understand 
them all.


Should we add one or two more, it would be with the realization 
(hopefully realized in the documentation also) that v and e would 
effectively be replacements for r and plain, rather than being combined 
with them.


Were I to design a new language with similar string syntax, I think I 
would use plain quotes for verbatim strings only, and have the following 
prefixes, in only a single case:


(no prefix) - verbatim UTF-8 (at this point, I see no reason not to 
require UTF-8 for the encoding of source files)

b - for verbatim bytes
e - allow (only explicitly documented) escapes
f - format strings

Actually, the above could be done as a preprocessor for python, or a 
future import. In other words, what you see is what you get, until you 
add a prefix to add additional processing.  The only combinations that 
seem useful are  eb  and  ef.  I don't know that constraining the order 
of the prefixes would be helpful or not, if it is helpful, I have no 
problem with a canonical ordering being prescribed.


As a future import, one could code modules to either the current 
combinatorial explosion with all its gotchas, special cases, and passing 
of undefined escapes; or one could code to the clean limited cases above.


Another thing that seems awkward about the current strings is that {{ 
and }} become "special escapes".  If it were not for the permissive 
usage of \{ and \} in the current plain string processing, \{ and \} 
could have been used to escape the non-format-expression uses of { and 
}, which would be far more consistent with other escapes.  Perhaps the 
future import could regularize that, also.


A future import would have no backward compatibility issues to disrupt a 
simplified, more regular syntax.


Does anyone know of an existing feature that couldn't be expressed in a 
straightforward manner with only the above capabilities?



The only other thing that I have heard about regarding strings is that 
multi-line strings have their first line indented, and other lines not. 
Some have recommended making the 

[Python-Dev] Re: What to do about invalid escape sequences

2019-08-11 Thread Eric V. Smith

On 8/11/2019 4:18 PM, Glenn Linderman wrote:

On 8/11/2019 2:50 AM, Steven D'Aprano wrote:

On Sat, Aug 10, 2019 at 12:10:55PM -0700, Glenn Linderman wrote:


Or invent "really raw" in some spelling, such as rr"c:\directory\"
or e for exact, or x for exact, or "c:\directory\"

And that brings me to the thought that if   \e  wants to become an
escape for escape, that maybe there should be an "extended escape"
prefix... if you want to use more escapes, define   ee"string where \\
can only be used as an escape or escaped character, \e means the ASCII
escape character, and \ followed by a character with no escape
definition would be an error."

Please no.

We already have b-strings, r-strings, u-strings, f-strings, br-strings,
rb-strings, fr-strings, rf-strings, each of which comes in four
varieties (single quote, double quote, triple single quote and triple
double quote). Now you're talking about adding rr-strings, v-strings
(Greg suggested that) and ee-strings, presumably some or all of which
will need b*- and *b- or f*- and *f- varieties too.


Don't forget the upper & lower case varieties :)


And all orders!

>>> _all_string_prefixes()
{'', 'b', 'BR', 'bR', 'B', 'rb', 'F', 'RF', 'rB', 'FR', 'Rf', 'Fr', 
'RB', 'f', 'r', 'rf', 'rF', 'R', 'u', 'fR', 'U', 'Br', 'Rb', 'fr', 'br'}

>>> len(_all_string_prefixes())
25

And if you add just 'bv' and 'fv', it's 41:

{'', 'fr', 'Bv', 'BR', 'F', 'rb', 'Fv', 'VB', 'vb', 'vF', 'br', 'FV', 
'vf', 'FR', 'fV', 'bV', 'Br', 'Vb', 'Rb', 'RF', 'bR', 'r', 'R', 'Vf', 
'fv', 'U', 'RB', 'B', 'rB', 'vB', 'Fr', 'rF', 'fR', 'Rf', 'BV', 'VF', 
'bv', 'b', 'u', 'f', 'rf'}


There would be no need for 'uv' (not needed for backward compatibility) 
or 'rv' (can't be both raw and verbatim).


I'm not in any way serious about this. I just want people to realize how 
many wacky combinations there would be. And heaven forbid we ever add 
some combination of 3 characters. If 'rfv' were actually also valid, you 
get to 89:


{'', 'br', 'vb', 'fR', 'F', 'rFV', 'fRv', 'fV', 'rVF', 'Rfv', 'u', 
'vRf', 'fVR', 'rfV', 'Fvr', 'vrf', 'fVr', 'vB', 'Vb', 'Rvf', 'Fv', 'Fr', 
'FVr', 'B', 'rVf', 'FVR', 'vfr', 'VB', 'VrF', 'BR', 'VRf', 'vfR', 'FR', 
'Br', 'RFV', 'Rf', 'fvR', 'f', 'rb', 'VfR', 'VFR', 'fr', 'vFR', 'VRF', 
'frV', 'bR', 'b', 'FrV', 'r', 'R', 'RVF', 'FV', 'rvF', 'FRV', 'Vrf', 
'rvf', 'FRv', 'Frv', 'vF', 'bV', 'VF', 'fv', 'RF', 'RB', 'rB', 'vRF', 
'RFv', 'RVf', 'Rb', 'Vfr', 'vrF', 'rf', 'Bv', 'vf', 'rF', 'U', 'bv', 
'FvR', 'RfV', 'Vf', 'VFr', 'vFr', 'fvr', 'BV', 'rFv', 'rfv', 'fRV', 
'frv', 'RvF'}


If only we could deprecate upper case prefixes!

Eric

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/B26LJOLLKKVDSQR6ZUVZKSFCU4WNXYC5/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-11 Thread Glenn Linderman

On 8/11/2019 2:50 AM, Steven D'Aprano wrote:

On Sat, Aug 10, 2019 at 12:10:55PM -0700, Glenn Linderman wrote:


Or invent "really raw" in some spelling, such as rr"c:\directory\"
or e for exact, or x for exact, or "c:\directory\"

And that brings me to the thought that if   \e  wants to become an
escape for escape, that maybe there should be an "extended escape"
prefix... if you want to use more escapes, define   ee"string where \\
can only be used as an escape or escaped character, \e means the ASCII
escape character, and \ followed by a character with no escape
definition would be an error."

Please no.

We already have b-strings, r-strings, u-strings, f-strings, br-strings,
rb-strings, fr-strings, rf-strings, each of which comes in four
varieties (single quote, double quote, triple single quote and triple
double quote). Now you're talking about adding rr-strings, v-strings
(Greg suggested that) and ee-strings, presumably some or all of which
will need b*- and *b- or f*- and *f- varieties too.


Don't forget the upper & lower case varieties :)


If the plan to deprecate unrecognised escapes and then make them an
exception goes ahead, and I expect that it will, in a few more releases
this "extended escape" ee-string will be completely redundent. If \e is
required, we will be able to add it to regular strings as needed,
likewise for any future new escapes we might want. (If any.)
So unrecognized escapes were deprecated in 3.6. And didn't get removed 
in 3.7. And from all indications, aren't going to be removed in 3.8. 
What makes you think the same arguments won't happen again for 3.9?



And if we end up keeping the existing behaviour, oh well, we can always
write \x1B instead. New escapes are a Nice To Have, not a Must Have.

"Really raw" rr'' versus "nearly raw" r'' is a source of confusion just
waiting to happen, when people use the wrong numbers of r's, or are
simply unclear which they should use.
I agree that Greg's v is far better than rr, especially if someone tried 
to write rfr or rbr.

It's not like we have no other options:

 location = r'C:\directory\subdirectory' '\\'

works fine.
But I never thought of that, until Serhiy mentioned it in his reply, so 
there are probably lots of other stupid people that didn't think of it 
either. It's not like it is even suggested in the documentation as a way 
to work around the non-rawness of raw strings. And it still requires 
doubling one of the \, so it is more consistent and understandable to 
just double them all.



  So does this:

 location = 'directory/subdirectory/'.replace('/', os.sep)


This is a far greater run-time cost with the need to scan the string. 
Granted the total cost isn't huge, unless it is done repeatedly.



Even better, instead of hard-coding our paths in the source code, we can
read them from a config file or database.
Yep, I do that sometimes. But hard-coded paths make good defaults in 
many circumstances.



It is unfortunate that Windows is so tricky with backslashes and
forwards slashes, and that it clashes with the escape character, but I'm
sure that other languages which use \ for escaping haven't proliferated
a four or more kinds of strings with different escaping rules in
response.


I agree with this. But Bill didn't consult Guido about the matter.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BPW6VYVKANWICN34TIOA6BVJYXX4MK3D/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-11 Thread Glenn Linderman

On 8/11/2019 1:26 AM, Serhiy Storchaka wrote:

10.08.19 22:10, Glenn Linderman пише:
As pointed out elsewhere, Raw strings have limitations, paths ending 
in \ cannot be represented, and such do exist in various situations, 
not all of which can be easily avoided... except by the "extra 
character contortion" of "C:\directory\ "[:-1]  (does someone know a 
better way?)


Other common idiom is

    r"C:\directory" "\\"


I suppose that concatenation happens at compile time; less sure about 
[:-1], I would guess not. Thanks for this.


I wonder how many raw strings actually use the \"  escape 
productively? Maybe that should be deprecated too! ?  I can't think 
of a good and necessary use for it, can anyone?


This is an interesting question. I have performed some experiments. 15 
files in the stdlib (not counting the tokenizer) use \' or \" in raw 
strings. And one test (test_venv) is failed because of using them in 
third-party code. All cases are in regular expressions. It is possible 
to rewrite them, but it is less trivial task than fixing invalid 
escape sequences. So changing this will require much much more long 
deprecation period.


Couldn't they be rewritten using the above idiom? Why would that be less 
trivial?
Or by using triple quotes, so the \" could be written as " ? That seems 
trivial.


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/46TOPB5ZY24OXXBGSLUXOQOJOASGBVTL/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-11 Thread Paul Moore
On Sun, 11 Aug 2019 at 03:37, Rob Cliffe via Python-Dev
 wrote:
> Usually, but not always.  I have not infrequently used files with a
> blank extension.
> I can't recall using a directory name with an extension (but I can't
> swear that I never have).

I've often seen directory names like "1. Overview" on Windows.
Technically, " Overview" would be the extension here. Of course,
that's a silly example, but the point is that there's a difference
between what's clear to a human and what's clear to a computer...

Paul
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PDS2EGV77Z5B2IZWCN5LWF7XWQGBLMWQ/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-11 Thread Steven D'Aprano
On Sat, Aug 10, 2019 at 12:10:55PM -0700, Glenn Linderman wrote:

> Or invent "really raw" in some spelling, such as rr"c:\directory\"
> or e for exact, or x for exact, or  here>"c:\directory\"
> 
> And that brings me to the thought that if   \e  wants to become an 
> escape for escape, that maybe there should be an "extended escape" 
> prefix... if you want to use more escapes, define   ee"string where \\ 
> can only be used as an escape or escaped character, \e means the ASCII 
> escape character, and \ followed by a character with no escape 
> definition would be an error."

Please no.

We already have b-strings, r-strings, u-strings, f-strings, br-strings, 
rb-strings, fr-strings, rf-strings, each of which comes in four 
varieties (single quote, double quote, triple single quote and triple 
double quote). Now you're talking about adding rr-strings, v-strings 
(Greg suggested that) and ee-strings, presumably some or all of which 
will need b*- and *b- or f*- and *f- varieties too.

If the plan to deprecate unrecognised escapes and then make them an 
exception goes ahead, and I expect that it will, in a few more releases 
this "extended escape" ee-string will be completely redundent. If \e is 
required, we will be able to add it to regular strings as needed, 
likewise for any future new escapes we might want. (If any.)

And if we end up keeping the existing behaviour, oh well, we can always 
write \x1B instead. New escapes are a Nice To Have, not a Must Have.

"Really raw" rr'' versus "nearly raw" r'' is a source of confusion just 
waiting to happen, when people use the wrong numbers of r's, or are 
simply unclear which they should use.

It's not like we have no other options:

location = r'C:\directory\subdirectory' '\\'

works fine. So does this:

location = 'directory/subdirectory/'.replace('/', os.sep)

Even better, instead of hard-coding our paths in the source code, we can 
read them from a config file or database.

It is unfortunate that Windows is so tricky with backslashes and 
forwards slashes, and that it clashes with the escape character, but I'm 
sure that other languages which use \ for escaping haven't proliferated 
a four or more kinds of strings with different escaping rules in 
response.



-- 
Steven
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/2ZPNZTP3B7OEG2LQQXAGGYG6B76LYDB5/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-11 Thread Eric V. Smith

On 8/10/2019 10:30 PM, Rob Cliffe via Python-Dev wrote:



On 10/08/2019 23:30:18, Greg Ewing wrote:

Rob Cliffe via Python-Dev wrote:


Also, the former is simply more *informative* - it tells the reader 
that baz is expected to be a directory, not a file.


On Windows you can usually tell that from the fact that filenames
almost always have an extension, and directory names almost never
do.

Usually, but not always.  I have not infrequently used files with a 
blank extension.
I can't recall using a directory name with an extension (but I can't 
swear that I never have).


I most commonly see this with bare git repositories .git. And 
I've created directory names with "extensions" for my own use.


Eric
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PJ4TPHOY6ZIWI5CQ56J3BYWTEBFYNMJU/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-11 Thread Serhiy Storchaka

10.08.19 22:10, Glenn Linderman пише:
As pointed out elsewhere, Raw strings have limitations, paths ending in 
\ cannot be represented, and such do exist in various situations, not 
all of which can be easily avoided... except by the "extra character 
contortion" of   "C:\directory\ "[:-1]  (does someone know a better way?)


Other common idiom is

r"C:\directory" "\\"

I wonder how many raw strings actually use the \"  escape productively? 
Maybe that should be deprecated too! ?  I can't think of a good and 
necessary use for it, can anyone?


This is an interesting question. I have performed some experiments. 15 
files in the stdlib (not counting the tokenizer) use \' or \" in raw 
strings. And one test (test_venv) is failed because of using them in 
third-party code. All cases are in regular expressions. It is possible 
to rewrite them, but it is less trivial task than fixing invalid escape 
sequences. So changing this will require much much more long deprecation 
period.

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GCDD6JQOPYENVDP3A62EFWHODIP2PFQM/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Rob Cliffe via Python-Dev



On 10/08/2019 23:30:18, Greg Ewing wrote:

Rob Cliffe via Python-Dev wrote:


Also, the former is simply more *informative* - it tells the reader 
that baz is expected to be a directory, not a file.


On Windows you can usually tell that from the fact that filenames
almost always have an extension, and directory names almost never
do.

Usually, but not always.  I have not infrequently used files with a 
blank extension.
I can't recall using a directory name with an extension (but I can't 
swear that I never have).

Rob Cliffe
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/2LDAY5FU64X5HH3GUVGAQNHRSWEB/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Glenn Linderman

On 8/10/2019 3:36 PM, Greg Ewing wrote:

Glenn Linderman wrote:


I wonder how many raw strings actually use the \"  escape 
productively? Maybe that should be deprecated too! ?  I can't think 
of a good and necessary use for it, can anyone?


Quite rare, I expect, but it's bound to break someone's code.
It might be better to introduce a new string prefix, e.g.
'v' for 'verbatim':

   v"C:\Users\Fred\"

Which is why I suggested  rr"C:\directory\", but allowed as how there 
might be better spellings I like your  v for verbatim !
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GTOVRKM7Q4VU67KYDQF6ICU7HAJDSBRX/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Greg Ewing

Glenn Linderman wrote:


I wonder how many raw strings actually use the \"  escape productively? 
Maybe that should be deprecated too! ?  I can't think of a good and 
necessary use for it, can anyone?


Quite rare, I expect, but it's bound to break someone's code.
It might be better to introduce a new string prefix, e.g.
'v' for 'verbatim':

   v"C:\Users\Fred\"

--
Greg
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/TQM37LMDVIKQ7UXLNLVMUUSF3ZYT7TYI/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Greg Ewing

Rob Cliffe via Python-Dev wrote:


Also, the former is simply more *informative* - it tells the reader that 
baz is expected to be a directory, not a file.


On Windows you can usually tell that from the fact that filenames
almost always have an extension, and directory names almost never
do.

--
Greg
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/F4Y4HNU72QOVWHCGLD74N7ZTAEJP2XBF/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread eryk sun
On 8/10/19, Rob Cliffe via Python-Dev  wrote:
> On 10/08/2019 11:50:35, eryk sun wrote:
>> On 8/9/19, Steven D'Aprano  wrote:
>>> I'm also curious why the string needs to *end* with a backslash. Both of
>>> these are the same path:
>>>
>>>  C:\foo\bar\baz\
>>>  C:\foo\bar\baz
>
> Also, the former is simply more *informative* - it tells the reader that
> baz is expected to be a directory, not a file.

This is an important point that I overlooked. The trailing backslash
is more than just a redundant character to inform human readers. Refer
to [MS-FSA] 2.1.5.1 "Server Requests an Open of a File" [1]. A
create/open fails with STATUS_OBJECT_NAME_INVALID if either of the
following is true:

* PathName contains a trailing backslash and
  CreateOptions.FILE_NON_DIRECTORY_FILE is
  TRUE.

* PathName contains a trailing backslash and
  StreamTypeToOpen is DataStream

For NtCreateFile or NtOpenFile (in the NT API), the
FILE_NON_DIRECTORY_FILE option restricts the call to a regular file,
and FILE_DIRECTORY_FILE restricts it to a directory. With neither
option, the call can target either a file or directory. A trailing
backslash is another information channel. It tells the filesystem that
the target has to be a directory. If we specify
FILE_NON_DIRECTORY_FILE with a trailing backslash on the name, this is
an immediate failure as an invalid name without even checking the
entry. If we specify neither option and use a trailing backslash, it's
an invalid name if the filesystem finds a regular file or data stream.
Had the call specified the FILE_DIRECTORY_FILE option, it would
instead fail with STATUS_NOT_A_DIRECTORY.

We can see this in practice in the published source for the fastfat
filesystem driver. FatCommonCreate [2] (for a create or open) has the
following code to handle the second case (in this code, an FCB is a
file control block for a regular file, and a DCB is a directory
control block):

if (NodeType(Fcb) == FAT_NTC_FCB) {
//
//  Check if we were only to open a directory
//
if (OpenDirectory) {
DebugTrace(0, Dbg, "Cannot open file as directory\n", 0);
try_return( Iosb.Status = STATUS_NOT_A_DIRECTORY );
}
DebugTrace(0, Dbg, "Open existing fcb, Fcb = %p\n", Fcb);
if ( TrailingBackslash ) {
try_return( Iosb.Status = STATUS_OBJECT_NAME_INVALID );
}

We observe the first case with a typical CreateFileW call, which uses
the option FILE_NON_DIRECTORY_FILE. In the following example "baz" is
a regular file:

>>> f = open(r'foo\bar\baz') # success
>>> try: open('foo\\bar\\baz\\')
... except OSError as e: print(e)
...
[Errno 22] Invalid argument: 'foo\\bar\\baz\\'

C EINVAL (22) is mapped from Windows ERROR_INVALID_NAME (123), which
is mapped from NT STATUS_OBJECT_NAME_INVALID (0xC033).

We can observe the second case with os.stat(), which calls CreateFileW
with backup semantics, which omits the FILE_NON_DIRECTORY_FILE option
in order to allow the call to open either a file or directory. In this
case the filesystem has to actually check that "baz" is a data file
before it can fail the call, as was shown in the fasfat code snippet
above:

>>> try: os.stat('foo\\bar\\baz\\')
... except OSError as e: print(e)
...
[WinError 123] The filename, directory name, or
volume label syntax is incorrect: 'foo\\bar\\baz\\'

[1] 
https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-fsa/8ada5fbe-db4e-49fd-aef6-20d54b748e40
[2] 
https://github.com/microsoft/Windows-driver-samples/blob/74200/filesys/fastfat/create.c#L1398
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QPDXUY4OXR2XOCNUHSKC7QRQGAXWV5WQ/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Glenn Linderman

On 8/10/2019 12:19 PM, Guido van Rossum wrote:

Regular expressions.


I assume that is in response to the "good use for \" escape" question?

But can't you just surround them with ' instead of " ?  Or  ''' ?



On Sat, Aug 10, 2019 at 12:12 Glenn Linderman > wrote:


On 8/10/2019 11:16 AM, Terry Reedy wrote:

On 8/10/2019 4:33 AM, Paul Moore wrote:


(Side issue)


This deserves its own thread.


As a Windows developer, who has seen far too many cases where
use of
slashes in filenames implies a Unix-based developer not thinking
sufficiently about Windows compatibility, or where it leads to
people
hard coding '/' rather than using os.sep (or better, pathlib), I
strongly object to this characterisation. Rather, I would simply
say
"to make Windows users more aware of the clash in usage between
backslashes in filenames and backslashes as string escapes".

There are *many* valid ways to write Windows pathnames in your
code:

1. Raw strings


As pointed out elsewhere, Raw strings have limitations, paths
ending in \ cannot be represented, and such do exist in various
situations, not all of which can be easily avoided... except by
the "extra character contortion" of "C:\directory\ "[:-1]  (does
someone know a better way?)

It would be useful to make a "really raw" string that doesn't
treat \ special in any way. With 4 different quoting possibilities
( ' " ''' """ ) there isn't really a reason to treat \ special at
the end of a raw string, except for backward compatibility.

I wonder how many raw strings actually use the \"  escape
productively? Maybe that should be deprecated too! ?  I can't
think of a good and necessary use for it, can anyone?

Or invent "really raw" in some spelling, such as rr"c:\directory\"
or e for exact, or x for exact, or "c:\directory\"

And that brings me to the thought that if   \e  wants to become an
escape for escape, that maybe there should be an "extended escape"
prefix... if you want to use more escapes, define   ee"string
where \\ can only be used as an escape or escaped character, \e
means the ASCII escape character, and \ followed by a character
with no escape definition would be an error."

Of course "extended escape" could be spelled lots of different
ways too, but not the same way as "really raw" :)


2. Doubling the backslashes
3. Using pathlib (possibly with slash as a directory separator,
where
it's explicitly noted as a portable option)
4. Using slashes

IMO, using slashes is the *worst* of these. But this latter is a
matter of opinion - I've no objection to others believing
differently,
but I *do* object to slashes being presented as the only option, or
the recommended option without qualification.


Perhaps Python Setup and Usage, 3. Using Python on Windows,
should have a section of file paths, at most x.y.z, so visible in
the TOC listed by https://docs.python.org/3/using/index.html



___
Python-Dev mailing list -- python-dev@python.org

To unsubscribe send an email to python-dev-le...@python.org

https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at

https://mail.python.org/archives/list/python-dev@python.org/message/5MZAXJJYKNMQAS63QW4HS2TUPMQH7LSL/

--
--Guido (mobile)


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BZDAXLX2IQTIUT2W47SFI2CJTZSPXY2V/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Guido van Rossum
Regular expressions.

On Sat, Aug 10, 2019 at 12:12 Glenn Linderman  wrote:

> On 8/10/2019 11:16 AM, Terry Reedy wrote:
>
> On 8/10/2019 4:33 AM, Paul Moore wrote:
>
> (Side issue)
>
>
> This deserves its own thread.
>
> As a Windows developer, who has seen far too many cases where use of
> slashes in filenames implies a Unix-based developer not thinking
> sufficiently about Windows compatibility, or where it leads to people
> hard coding '/' rather than using os.sep (or better, pathlib), I
> strongly object to this characterisation. Rather, I would simply say
> "to make Windows users more aware of the clash in usage between
> backslashes in filenames and backslashes as string escapes".
>
> There are *many* valid ways to write Windows pathnames in your code:
>
> 1. Raw strings
>
>
> As pointed out elsewhere, Raw strings have limitations, paths ending in \
> cannot be represented, and such do exist in various situations, not all of
> which can be easily avoided... except by the "extra character contortion"
> of   "C:\directory\ "[:-1]  (does someone know a better way?)
>
> It would be useful to make a "really raw" string that doesn't treat \
> special in any way. With 4 different quoting possibilities ( ' " ''' """ )
> there isn't really a reason to treat \ special at the end of a raw string,
> except for backward compatibility.
>
> I wonder how many raw strings actually use the \"  escape productively?
> Maybe that should be deprecated too! ?  I can't think of a good and
> necessary use for it, can anyone?
>
> Or invent "really raw" in some spelling, such as rr"c:\directory\"
> or e for exact, or x for exact, or  here>"c:\directory\"
>
> And that brings me to the thought that if   \e  wants to become an escape
> for escape, that maybe there should be an "extended escape" prefix... if
> you want to use more escapes, define   ee"string where \\ can only be used
> as an escape or escaped character, \e means the ASCII escape character, and
> \ followed by a character with no escape definition would be an error."
>
> Of course "extended escape" could be spelled lots of different ways too,
> but not the same way as "really raw" :)
>
> 2. Doubling the backslashes
> 3. Using pathlib (possibly with slash as a directory separator, where
> it's explicitly noted as a portable option)
> 4. Using slashes
>
> IMO, using slashes is the *worst* of these. But this latter is a
> matter of opinion - I've no objection to others believing differently,
> but I *do* object to slashes being presented as the only option, or
> the recommended option without qualification.
>
>
> Perhaps Python Setup and Usage, 3. Using Python on Windows, should have a
> section of file paths, at most x.y.z, so visible in the TOC listed by
> https://docs.python.org/3/using/index.html
>
>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/5MZAXJJYKNMQAS63QW4HS2TUPMQH7LSL/
>
-- 
--Guido (mobile)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LSFNRZTMK6HLUCE7IAWKD3GCBLZ7KINQ/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Glenn Linderman

On 8/10/2019 11:16 AM, Terry Reedy wrote:

On 8/10/2019 4:33 AM, Paul Moore wrote:


(Side issue)


This deserves its own thread.


As a Windows developer, who has seen far too many cases where use of
slashes in filenames implies a Unix-based developer not thinking
sufficiently about Windows compatibility, or where it leads to people
hard coding '/' rather than using os.sep (or better, pathlib), I
strongly object to this characterisation. Rather, I would simply say
"to make Windows users more aware of the clash in usage between
backslashes in filenames and backslashes as string escapes".

There are *many* valid ways to write Windows pathnames in your code:

1. Raw strings


As pointed out elsewhere, Raw strings have limitations, paths ending in 
\ cannot be represented, and such do exist in various situations, not 
all of which can be easily avoided... except by the "extra character 
contortion" of   "C:\directory\ "[:-1]  (does someone know a better way?)


It would be useful to make a "really raw" string that doesn't treat \ 
special in any way. With 4 different quoting possibilities ( ' " ''' """ 
) there isn't really a reason to treat \ special at the end of a raw 
string, except for backward compatibility.


I wonder how many raw strings actually use the \"  escape productively? 
Maybe that should be deprecated too! ?  I can't think of a good and 
necessary use for it, can anyone?


Or invent "really raw" in some spelling, such as rr"c:\directory\"
or e for exact, or x for exact, or here>"c:\directory\"


And that brings me to the thought that if   \e  wants to become an 
escape for escape, that maybe there should be an "extended escape" 
prefix... if you want to use more escapes, define   ee"string where \\ 
can only be used as an escape or escaped character, \e means the ASCII 
escape character, and \ followed by a character with no escape 
definition would be an error."


Of course "extended escape" could be spelled lots of different ways too, 
but not the same way as "really raw" :)



2. Doubling the backslashes
3. Using pathlib (possibly with slash as a directory separator, where
it's explicitly noted as a portable option)
4. Using slashes

IMO, using slashes is the *worst* of these. But this latter is a
matter of opinion - I've no objection to others believing differently,
but I *do* object to slashes being presented as the only option, or
the recommended option without qualification.


Perhaps Python Setup and Usage, 3. Using Python on Windows, should 
have a section of file paths, at most x.y.z, so visible in the TOC 
listed by https://docs.python.org/3/using/index.html




___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5MZAXJJYKNMQAS63QW4HS2TUPMQH7LSL/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Rob Cliffe via Python-Dev



On 10/08/2019 11:50:35, eryk sun wrote:

On 8/9/19, Steven D'Aprano  wrote:

I'm also curious why the string needs to *end* with a backslash. Both of
these are the same path:

 C:\foo\bar\baz\
 C:\foo\bar\baz
Also, the former is simply more *informative* - it tells the reader that 
baz is expected to be a directory, not a file.

Rob Cliffe

The above two cases are equivalent. But that's not the case for the
root directory. Unlike Unix, filesystem namespaces are implemented
directly on devices. For example, "//./C:" might resolve to a volume
device such as "\\Device\\HarddiskVolume2". With a trailing slash
added, "//./C:/" resolves to "\\Device\\HarddiskVolume2\\", which is
the root directory of the mounted filesystem on the volume.

Also, as a classic DOS path, "C:" without a trailing slash expands to
the working directory on drive "C:". The system runtime library looks
for this path in a hidden environment variable named "=C:". The
Windows API never sets these hidden "=X:" drive variables. The C
runtime sets them, as does Python's os.chdir.

Some volume-management functions require a trailing slash or
backslash, such as GetVolumeInformationW [1].
GetVolumeNameForVolumeMountPointW [2] actually requires it to be a
trailing backslash. It will not accept a trailing forward slash such
as "C:\\Mount\\Volume/" (a bug since Windows 2000). The volume name
(e.g. "?\\Volume{----}\\")
returned by the latter includes a trailing backslash, which must be
present in the target path in order for a mountpoint to function
properly as a directory, else it would resolve to the volume device
instead of the root directory.

[1] 
https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getvolumeinformationw
[2] 
https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getvolumenameforvolumemountpointw


If they're Windows developers, they ought to be aware that the Windows
file system API allows / anywhere you can use \ and it is the
common convention in Python to use forward slashes.

The Windows file API actually does not allow slash to be used anywhere
that we can use backslash. It's usually allowed, but not always. For
the most part, the conditions where forward slash is not supported are
intentional.

Windows replaces forward slash with backslash in normal DOS paths and
normal device paths. But sometimes we have to use a special form of
device path that bypasses normalization. A path that isn't normalized
can only use backslash as the path separator. For example, the most
common case is that the process doesn't have long paths enabled. In
this case we're limited to MAX_PATH, which limits file paths to a
paltry 259 characters (sans the terminating null); the current
directory to 258 characters (sans a trailing backslash and null); and
the path of a new directory to 247 characters (subtract 12 from 259 to
leave space for an 8.3 filename). By skipping DOS normalization, we
can access a path with up to about 32,750 characters (i.e. 32,767 sans
the length of the device name in the final NT path under
"\\Device\\").

(Long normalized paths are available starting in Windows 10, but the
system policy that allows this is disabled by default, and even if
enabled, each application has to declare itself to be long-path aware
in its manifest. This is declared for python[w].exe in Python 3.6+.)

A device path is an explicit reference to a user's local device
directory (in the object namespace), which shadows the global device
directory. In NT, this directory is aliased to a special "\\??\\"
prefix (backslash only). A local device directory is created for each
logon session (not terminal session) by the security system that runs
in terminal session 0 (i.e. the system services session). The
per-logon directory is located at "\\Sessions\\0\\DosDevices\\". In the Windows API, it's accessible as "//?/" or "//./",
or with any mix of forward slashes or backslashes, but only the
all-backslash form is special-cased to bypass the normalization step.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/3SDFM2EKFO3UNTATS7KVBY2WOUTFMAF5/

---
This email has been checked for viruses by AVG.
https://www.avg.com




___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/IVNUAUUHURCS4P77ZVFFK3H665ZKXGBC/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Terry Reedy

On 8/10/2019 4:33 AM, Paul Moore wrote:


(Side issue)


This deserves its own thread.


As a Windows developer, who has seen far too many cases where use of
slashes in filenames implies a Unix-based developer not thinking
sufficiently about Windows compatibility, or where it leads to people
hard coding '/' rather than using os.sep (or better, pathlib), I
strongly object to this characterisation. Rather, I would simply say
"to make Windows users more aware of the clash in usage between
backslashes in filenames and backslashes as string escapes".

There are *many* valid ways to write Windows pathnames in your code:

1. Raw strings
2. Doubling the backslashes
3. Using pathlib (possibly with slash as a directory separator, where
it's explicitly noted as a portable option)
4. Using slashes

IMO, using slashes is the *worst* of these. But this latter is a
matter of opinion - I've no objection to others believing differently,
but I *do* object to slashes being presented as the only option, or
the recommended option without qualification.


Perhaps Python Setup and Usage, 3. Using Python on Windows, should have 
a section of file paths, at most x.y.z, so visible in the TOC listed by 
https://docs.python.org/3/using/index.html


--
Terry Jan Reedy
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SH3M5GGHJPIMKVTEYI6FFBYWHVZT7O64/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Glenn Linderman

On 8/10/2019 7:03 AM, Paul Moore wrote:

On Sat, 10 Aug 2019 at 12:06, Chris Angelico  wrote:

On Sat, Aug 10, 2019 at 6:39 PM Paul Moore  wrote:

There are *many* valid ways to write Windows pathnames in your code:

1. Raw strings
2. Doubling the backslashes
3. Using pathlib (possibly with slash as a directory separator, where
it's explicitly noted as a portable option)
4. Using slashes

IMO, using slashes is the *worst* of these. But this latter is a
matter of opinion - I've no objection to others believing differently,
but I *do* object to slashes being presented as the only option, or
the recommended option without qualification.

Please expand on why this is the worst?

I did say it was a matter of opinion, so I'm not going to respond if
people say that any of the following is "wrong", but since you asked:

1. Backslash is the native separator, whereas slash is not (see eryk
sun's post for *way* more detail).
2. People who routinely use slash have a tendency to forget to use
os.sep rather than a literal slash in places where it *does* matter.
3. Using slash, in my experience, ends up with paths with "mixed"
separators (os.path.join("C:/work/apps", "foo") ->
'C:/work/apps\\foo') which are messy to deal with, and ugly for the
user.
4. If a path with slashes is displayed directly to the user without
normalisation, it looks incorrect and can confuse users who are only
used to "native" Windows programs.

Etc.
Not to mention the problem of passing paths with / to other windows 
programs via system or subprocess.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/A7MBGUBTRNLZ5UWCMS4NHYAFGQC6MNQJ/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Paul Moore
On Sat, 10 Aug 2019 at 12:06, Chris Angelico  wrote:
>
> On Sat, Aug 10, 2019 at 6:39 PM Paul Moore  wrote:
> > There are *many* valid ways to write Windows pathnames in your code:
> >
> > 1. Raw strings
> > 2. Doubling the backslashes
> > 3. Using pathlib (possibly with slash as a directory separator, where
> > it's explicitly noted as a portable option)
> > 4. Using slashes
> >
> > IMO, using slashes is the *worst* of these. But this latter is a
> > matter of opinion - I've no objection to others believing differently,
> > but I *do* object to slashes being presented as the only option, or
> > the recommended option without qualification.
>
> Please expand on why this is the worst?

I did say it was a matter of opinion, so I'm not going to respond if
people say that any of the following is "wrong", but since you asked:

1. Backslash is the native separator, whereas slash is not (see eryk
sun's post for *way* more detail).
2. People who routinely use slash have a tendency to forget to use
os.sep rather than a literal slash in places where it *does* matter.
3. Using slash, in my experience, ends up with paths with "mixed"
separators (os.path.join("C:/work/apps", "foo") ->
'C:/work/apps\\foo') which are messy to deal with, and ugly for the
user.
4. If a path with slashes is displayed directly to the user without
normalisation, it looks incorrect and can confuse users who are only
used to "native" Windows programs.

Etc.

Paul
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QNAZ4G7VCCBZSFJLUCGH6NTTGW726R6G/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread eryk sun
On 8/10/19, eryk sun  wrote:
>
> The per-logon directory is located at "\\Sessions\\0\\DosDevices\\ Session ID>". In the Windows API, it's accessible as "//?/" or "//./",
> or with any mix of forward slashes or backslashes, but only the
> all-backslash form is special-cased to bypass the normalization step.

Correction: I slipped up in that last sentence. Only the all-backslash
form that's in the "?" namespace bypasses normalization, as most
Windows users should at least have seen in passing. These special
device paths pop up here and there. For example, r'\\?\C:\Temp\spam. .
.' allows creating or opening a file named "spam. . .", which the
Windows API would normalize as "spam". But I don't recommend
sidestepping the normal rules -- except for the path length limit
because there are ways to make long paths conveniently accessible
(e.g. symbolic links, bind-like mountpoints, and subst drives).

Sometimes people also come across "\\??\\" paths and come to the
mistaken conclusion that these can be used in Windows API programs.
No, they're for NT. The runtime library mangles them, e.g.
nt._getfullpathname(r'\??\C:') == 'C:\\??\\C:'.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VANNT2SIH7EBPEOUC6M7HI7PYASJPYC7/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Rob Cliffe via Python-Dev



On 06/08/2019 23:41:25, Greg Ewing wrote:

Rob Cliffe via Python-Dev wrote:


Sorry, that won't work.  Strings are parsed at compile time, open() 
is executed at run-time.


It could check for control characters, which are probably the result
of a backslash accident. Maybe even auto-correct them...


By "It", do you mean open() ?  If so:
It already checks for control characters, at least with Python 2.7 on 
Windows:


>>> open('mydir\test')
Traceback (most recent call last):
  File "", line 1, in 
IOError: [Errno 22] invalid mode ('r') or filename: 'mydir\test'

As for auto-correct (presumably "\a" to "\\a", "\b" to "\\b" etc.), I 
hope you're not serious.
"In the face of gibberish, refuse the temptation to show how smart your 
guessing is."

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UK46EASIZVFTIQPORH7AG3EFB522NFI3/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Chris Angelico
On Sat, Aug 10, 2019 at 6:39 PM Paul Moore  wrote:
> There are *many* valid ways to write Windows pathnames in your code:
>
> 1. Raw strings
> 2. Doubling the backslashes
> 3. Using pathlib (possibly with slash as a directory separator, where
> it's explicitly noted as a portable option)
> 4. Using slashes
>
> IMO, using slashes is the *worst* of these. But this latter is a
> matter of opinion - I've no objection to others believing differently,
> but I *do* object to slashes being presented as the only option, or
> the recommended option without qualification.

Please expand on why this is the worst?

ChrisA
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PXVO7OT4EK2GRDC5DM6JXMP3WBOVC7DC/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread eryk sun
On 8/9/19, Steven D'Aprano  wrote:
>
> I'm also curious why the string needs to *end* with a backslash. Both of
> these are the same path:
>
> C:\foo\bar\baz\
> C:\foo\bar\baz

The above two cases are equivalent. But that's not the case for the
root directory. Unlike Unix, filesystem namespaces are implemented
directly on devices. For example, "//./C:" might resolve to a volume
device such as "\\Device\\HarddiskVolume2". With a trailing slash
added, "//./C:/" resolves to "\\Device\\HarddiskVolume2\\", which is
the root directory of the mounted filesystem on the volume.

Also, as a classic DOS path, "C:" without a trailing slash expands to
the working directory on drive "C:". The system runtime library looks
for this path in a hidden environment variable named "=C:". The
Windows API never sets these hidden "=X:" drive variables. The C
runtime sets them, as does Python's os.chdir.

Some volume-management functions require a trailing slash or
backslash, such as GetVolumeInformationW [1].
GetVolumeNameForVolumeMountPointW [2] actually requires it to be a
trailing backslash. It will not accept a trailing forward slash such
as "C:\\Mount\\Volume/" (a bug since Windows 2000). The volume name
(e.g. "?\\Volume{----}\\")
returned by the latter includes a trailing backslash, which must be
present in the target path in order for a mountpoint to function
properly as a directory, else it would resolve to the volume device
instead of the root directory.

[1] 
https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getvolumeinformationw
[2] 
https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getvolumenameforvolumemountpointw

> If they're Windows developers, they ought to be aware that the Windows
> file system API allows / anywhere you can use \ and it is the
> common convention in Python to use forward slashes.

The Windows file API actually does not allow slash to be used anywhere
that we can use backslash. It's usually allowed, but not always. For
the most part, the conditions where forward slash is not supported are
intentional.

Windows replaces forward slash with backslash in normal DOS paths and
normal device paths. But sometimes we have to use a special form of
device path that bypasses normalization. A path that isn't normalized
can only use backslash as the path separator. For example, the most
common case is that the process doesn't have long paths enabled. In
this case we're limited to MAX_PATH, which limits file paths to a
paltry 259 characters (sans the terminating null); the current
directory to 258 characters (sans a trailing backslash and null); and
the path of a new directory to 247 characters (subtract 12 from 259 to
leave space for an 8.3 filename). By skipping DOS normalization, we
can access a path with up to about 32,750 characters (i.e. 32,767 sans
the length of the device name in the final NT path under
"\\Device\\").

(Long normalized paths are available starting in Windows 10, but the
system policy that allows this is disabled by default, and even if
enabled, each application has to declare itself to be long-path aware
in its manifest. This is declared for python[w].exe in Python 3.6+.)

A device path is an explicit reference to a user's local device
directory (in the object namespace), which shadows the global device
directory. In NT, this directory is aliased to a special "\\??\\"
prefix (backslash only). A local device directory is created for each
logon session (not terminal session) by the security system that runs
in terminal session 0 (i.e. the system services session). The
per-logon directory is located at "\\Sessions\\0\\DosDevices\\". In the Windows API, it's accessible as "//?/" or "//./",
or with any mix of forward slashes or backslashes, but only the
all-backslash form is special-cased to bypass the normalization step.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/3SDFM2EKFO3UNTATS7KVBY2WOUTFMAF5/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Paul Moore
On Sat, 10 Aug 2019 at 00:36, Steven D'Aprano  wrote:
> 2. To strongly discourage newbie Windows developers from hard-coding
> paths using backslashes, but to use forward slashes instead.

(Side issue)

As a Windows developer, who has seen far too many cases where use of
slashes in filenames implies a Unix-based developer not thinking
sufficiently about Windows compatibility, or where it leads to people
hard coding '/' rather than using os.sep (or better, pathlib), I
strongly object to this characterisation. Rather, I would simply say
"to make Windows users more aware of the clash in usage between
backslashes in filenames and backslashes as string escapes".

There are *many* valid ways to write Windows pathnames in your code:

1. Raw strings
2. Doubling the backslashes
3. Using pathlib (possibly with slash as a directory separator, where
it's explicitly noted as a portable option)
4. Using slashes

IMO, using slashes is the *worst* of these. But this latter is a
matter of opinion - I've no objection to others believing differently,
but I *do* object to slashes being presented as the only option, or
the recommended option without qualification.

Paul
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FZABAKCBZY72FKFRPK3OXPLKSQ62JZ6N/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Steve Holden
While not a total solution, it seems like it might be worthwhile forcing
flake8 or similar checks when uploading PyPI modules.

That would catch the illegal escape sequences where it really matters -
before they enter the ecosystem.

(general) fathead:pyxll-www sholden$ cat t.py
"Docstring with illegal \escape sequence"
(general) fathead:pyxll-www sholden$ flake8 t.py
t.py:1:25: W605 invalid escape sequence '\e'

while this won't mitigate the case for existing packages, it should reduce
the number of packages containing potentially erroneous string constants,
preparing the ground for the eventual introduction of the syntax error.

Steve Holden


On Sat, Aug 10, 2019 at 8:07 AM Serhiy Storchaka 
wrote:

> 10.08.19 02:04, Gregory P. Smith пише:
> > I've merged the PR reverting the behavior in 3.8 and am doing the same
> > in the master branch.
>
> I was going to rebase it to master and go in normal backporting process
> if we decide that DeprecationWarning should be in master. I waited the
> end of the discussion.
>
> > Recall the nightmare caused by md5.py and sha.py DeprecationWarning's in
> > 2.5...  this would be similar.
>
> It is very different because DeprecationWarning for md5.py and sha.py is
> emitted at runtime.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/H5VXWS6UT2OZBTXG7HUERKAQQIQ4BYEA/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/24ID6EF6ESG64B6VFXVRL4XNWP5I7ITW/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Serhiy Storchaka

10.08.19 02:04, Gregory P. Smith пише:
I've merged the PR reverting the behavior in 3.8 and am doing the same 
in the master branch.


I was going to rebase it to master and go in normal backporting process 
if we decide that DeprecationWarning should be in master. I waited the 
end of the discussion.


Recall the nightmare caused by md5.py and sha.py DeprecationWarning's in 
2.5...  this would be similar.


It is very different because DeprecationWarning for md5.py and sha.py is 
emitted at runtime.

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/H5VXWS6UT2OZBTXG7HUERKAQQIQ4BYEA/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Serhiy Storchaka

09.08.19 19:39, Steve Dower пише:
I also posted another possible option that helps solve the real problem 
faced by users, and not just the "we want to have a warning" problem 
that is purely ours.


Warnings solve two problems:

* Teaching users that a backslash has special meaning and should be 
escaped unless it is used for special meaning.


* Avoid breaking or introducing bugs if we add new escape sequences 
(like \e).


* change the SyntaxWarning into a default-silenced one that fires 
every time a .pyc is loaded (this is the hard part, but it's doable)


It was considered an advantage that these warnings are shown only once 
at compile time. So they will be shown to the author of the code, but 
the user of the code will not see them (except of installation time).


Actually we need to distinguish the the author and the user of the code 
and show warnings only to the author. Using .pyc files was just an 
heuristic: the author compiles the Python code, and the user uses 
compiled .pyc files. Would be nice to have more reliable way to 
determine the owning of the code. It is related not only to 
SyntaxWarnings, but to runtime DeprecationWarnings. Maybe silence 
warnings only for readonly files and make files installed by PIP readonly?


* change pathlib.PureWindowsPath, os.fsencode and os.fsdecode to 
explicitly warn when the path contains control characters


This can cause additional harm. Currently you get expected FileNotFound 
when use user specified bad path, it can be caught and handled. But with 
warnings you will either get a noise on the output or an unexpected 
unhandled error.


* change the PyErr_SetExcFromWindowsErrWithFilenameObjects function to 
append (or chain) an extra message when either of the filenames 
contains control characters (or change OSError to do it, or the 
default sys.excepthook)


I do not understand what goal will be achieved by this.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BCAOEGQYK5KYAMPDQ5O6KWGCOOQUJ6UV/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-09 Thread Glenn Linderman

On 8/9/2019 3:56 PM, Steven D'Aprano wrote:

I'm not trying to be confrontational, I'm trying to understand your
use-case(s) and see if it would be broken by the planned change to
string escapes.


Yeah, that's fine. Sometimes it is hard to communicate via email (versus 
saying a lot).



On Fri, Aug 09, 2019 at 03:18:29PM -0700, Glenn Linderman wrote:

On 8/9/2019 2:53 PM, Steven D'Aprano wrote:

On Fri, Aug 09, 2019 at 01:12:59PM -0700, Glenn Linderman wrote:


The reason I never use raw strings is in the documentation, it is
because \ still has a special meaning, and the first several times I
felt the need for raw strings, it was for directory names that wanted to
end with \ and couldn't.

Can you elaborate? I find it unlikely that I would ever want a docstring

I didn't mention docstring.  I just wanted a string with a path name
ending in \.

You said you never used raw strings in the documentation. I read that as
doc strings. What sort of documentation are you writing that isn't a doc
string but is inside your .py files where the difference between raw and
regular strings is meaningful?


No, what I said was that the reason is in the documentation. The reason 
that I don't use raw strings is in the Python documentation. I don't 
claim to use raw strings for documentation I write. The reason is 
because \" to end the string doesn't work, and the first good-sounding 
justification for using raw strings that I stumbled across was to avoid 
"c:\\directory\\" in favor of r"c:\directory\"  but that doesn't work, 
and neither do r"c:\directory\\". Since then, I have not found any other 
compelling need for raw strings that overcome that deficiency... the 
benefit of raw strings is that you don't have to double the \\. But the 
benefit is contradicted by not being able to use one at the end of 
sting. If you can't use it at the end of the string, the utility of not 
doubling them in the middle of the string is just too confusing to make 
it worth figuring out the workarounds when you have a string full of \ 
that happens to end in \. Just easier to remember the "always double \" 
rule, than to remember the extra "but if your string containing \ 
doesn't have one at the end you can get away with using a raw string and 
not doubling the \.



Windows users are used to seeing backslashes in paths, I don't care to
be the one to explain why my program uses / and all the rest use \.

If you don't use raw strings for paths, you get to explain why your
program uses \\ and all the rest use \ *wink*
Wrong. Users don't look at the source code. They look at the output. I 
also don't want to have to write code to convert /-laden paths to 
\-laden paths when I display them to the user.




If they're Windows end users, they won't be reading your source code and
will never know how you represent hard-coded paths in the source code.


They will if I display the path as a default value for an argument, or 
show them the path for other reasons, or if the path shows up in an 
exception message.




If they're Windows developers, they ought to be aware that the Windows
file system API allows / anywhere you can use \ and it is the
common convention in Python to use forward slashes.


This, we can agree on.


I'm also curious why the string needs to *end* with a backslash. Both of
these are the same path:

 C:\foo\bar\baz\
 C:\foo\bar\baz


Sure. But only one of them can be used successfully with   + filename  
(for example).


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OYRSO4WHUFA7Q34HJTBIMQL337JTA5RX/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-09 Thread Glenn Linderman

On 8/9/2019 4:08 PM, MRAB wrote:

On 2019-08-09 23:56, Steven D'Aprano wrote:

I'm not trying to be confrontational, I'm trying to understand your
use-case(s) and see if it would be broken by the planned change to
string escapes.


On Fri, Aug 09, 2019 at 03:18:29PM -0700, Glenn Linderman wrote:

On 8/9/2019 2:53 PM, Steven D'Aprano wrote:
>On Fri, Aug 09, 2019 at 01:12:59PM -0700, Glenn Linderman wrote:
>
>>The reason I never use raw strings is in the documentation, it is
>>because \ still has a special meaning, and the first several times I
>>felt the need for raw strings, it was for directory names that 
wanted to

>>end with \ and couldn't.
>Can you elaborate? I find it unlikely that I would ever want a 
docstring


I didn't mention docstring.  I just wanted a string with a path name 
ending in \.


You said you never used raw strings in the documentation. I read that as
doc strings. What sort of documentation are you writing that isn't a doc
string but is inside your .py files where the difference between raw and
regular strings is meaningful?


Windows users are used to seeing backslashes in paths, I don't care 
to be the one to explain why my program uses / and all the rest use \.


If you don't use raw strings for paths, you get to explain why your
program uses \\ and all the rest use \ *wink*

If they're Windows end users, they won't be reading your source code and
will never know how you represent hard-coded paths in the source code.

If they're Windows developers, they ought to be aware that the Windows
file system API allows / anywhere you can use \ and it is the
common convention in Python to use forward slashes.

I'm also curious why the string needs to *end* with a backslash. Both of
these are the same path:

 C:\foo\bar\baz\
 C:\foo\bar\baz



The only time it's required is for the root directory of a drive:

C:\


That's not the only time it's required, but it is a case that is far 
harder to specify in other ways.  It's required any time you  want to 
say   + filename without writing + "\\" + filename, or os.path.join( 
'C:\\", filename )
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LXM72OGMFTNJP3NQPITJWWGB6ITNRBH4/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-09 Thread Glenn Linderman

On 8/9/2019 4:07 PM, Gregory P. Smith wrote:



On Fri, Aug 9, 2019 at 11:37 AM Eric V. Smith > wrote:


On 8/9/2019 2:28 PM, Jonathan Goble wrote:
> On Fri, Aug 9, 2019 at 12:34 PM Nick Coghlan mailto:ncogh...@gmail.com>> wrote:
>> I find the "Our deprecation warnings were even less visible than
>> normal" argument for extending the deprecation period compelling.
> Outsider's 2 cents from reading this discussion (with no personal
> experience with this warning):
>
> I am perplexed at the opinion, seemingly espoused by multiple people
> in this thread, that because a major part of the problem is that the
> warnings were not visible enough, somehow the proposed solution is
> making them not visible enough again? It's too late, in my
> understanding, in the 3.8 cycle to add a new feature like a
change to
> how these warnings are produced (it seems a significant change
to the
> .pyc structure is needed to emit them at runtime), so this supposed
> "solution" is nothing but kicking the can down the road. When 3.9
> rolls around, public exposure to the problem of invalid escape
> sequences will still be approximately what it is now (because if
> nobody saw the warnings in 3.7, they certainly won't see them in 3.8
> with this "fix"), so you'll end up with the same complaints about
> SyntaxWarning that started this discussion, end up back on
> DeprecationWarning for 3.9 (hopefully with support for emitting them
> at runtime instead of just compile-time), then have to wait until
> 3.10/4.0 for SyntaxWarning and eventually the next version to
actually
> make them errors.

Yes, I think that's the idea: Deprecation warning in 3.9, but more
visible that what 3.7 has. That is, not just at compile time but
at run
time. What's required to make that happen is an open question.


i've lost track of who suggested what in this thread, but yes, that 
concept has been rolling over in my mind as a potentially good idea 
after someone suggested it.  Compile time warnings should turn into 
bytecode for a warnings.warn call in the generated pyc.  I haven't 
spent time trying to reason if that actually addresses the real issues 
we're having moving forward with a syntax warning change though. A 
reasonable feature to ask for as a feature in 3.9 or later perhaps.


The documentation actually claims it was deprecated in version 3.6. So 
it has already been 2 releases worth of deprecation, visible warning or not.


Ship it.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YYSC5CDJWOF24AUWC4IHJG45COHOTHW3/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-09 Thread Steven D'Aprano
On Fri, Aug 09, 2019 at 02:28:13PM -0400, Jonathan Goble wrote:

> I am perplexed at the opinion, seemingly espoused by multiple people
> in this thread, that because a major part of the problem is that the
> warnings were not visible enough, somehow the proposed solution is
> making them not visible enough again?

Making the warnings invisible by default is only the first step, not the 
entire solution.

We don't break backwards compatibility lightly, and the current 
behaviour is not an accident, it is a documented feature which 
developers are entitled to rely on.

We are chosing to change that behaviour, breaking backwards 
compatibility, to the inconvenience of end-users, library authors, 
and developers on Mac/Unix/Linux, for two benefits:

1. To possibly allow the addition of new escape sequences such as \e 
some time in the future.

2. To strongly discourage newbie Windows developers from hard-coding 
paths using backslashes, but to use forward slashes instead.


Especially on Python-Ideas, time and time again we hear the mantra that 
we should only break backwards compatibility if the benefit strongly 
outweighs the cost of change. Raymond has given compelling (to me at 
least) testimony that right now, the cost of change is far too high for 
the two minor benefits gained.

So *right now*, it looks like we ought to be prepared to back away from 
the change altogether. We thought that the balance would be:

"it will be a little bit painful, but the benefit will outweigh the pain"

justifying breaking backwards compatibility, but we have found that the 
pain is greater than expected. If we cannot reduce the pain, and move 
the balance into the "nett positive" rather than the "nett negative" we 
have right now, we ought to cancel the deprecation.

Making the deprecation silent by default will reduce the pain. That's 
the first step. Pushing the deprecation schedule back a release or more 
will give us time to rethink the deprecation process, fix the technical 
issues we discovered about SyntaxWarnings, and give library authors time 
to eliminate the warnings from their libraries.


> It's too late, in my
> understanding, in the 3.8 cycle to add a new feature like a change to
> how these warnings are produced (it seems a significant change to the
> .pyc structure is needed to emit them at runtime), so this supposed
> "solution" is nothing but kicking the can down the road.

Is that a problem? Any deadline we have to make unrecognised backslash 
escapes an error is a self-imposed deadline. We lived with this feature 
for more than a quarter of a century, we can keep kicking the can down 
the road until the benefit outweighs the pain.

If that means "forever", then I personally will be sad, but so be it.

However, even if it is too late to add any new tools or features to 
Python 3.8 (and that's not clear: this won't be a *language* change, so 
the feature freeze may not apply) all is not lost.

We're aware of the problem, and can start pointing library authors at 
this thread, and the relevent b.p.o. ticket, and push them in the right 
direction.

Raymond mentioned two libraries by name, bottle and docutils, and Matt 
scanned the top 100 packages on PyPI. That's a good place to start for 
anyone wanting to contribute: raise bug reports on the individual 
library trackers. (If they haven't already been raised.)

https://github.com/bottlepy/bottle/issues

(I'd do that myself except I have technical problems using Github.)

I have reported it to docutils:

https://sourceforge.net/p/docutils/bugs/373/


[...]
> So put these warnings front and center now
> so package and code maintainers actually see it

The problem is that this seriously and negatively affects the experience 
for many end-users. That's what we're trying to prevent, or at least 
mitigate.



-- 
Steven
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/B7FH5IGUX24J7X7QEANAOSTIKTOHZJ5E/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-09 Thread MRAB

On 2019-08-09 23:56, Steven D'Aprano wrote:

I'm not trying to be confrontational, I'm trying to understand your
use-case(s) and see if it would be broken by the planned change to
string escapes.


On Fri, Aug 09, 2019 at 03:18:29PM -0700, Glenn Linderman wrote:

On 8/9/2019 2:53 PM, Steven D'Aprano wrote:
>On Fri, Aug 09, 2019 at 01:12:59PM -0700, Glenn Linderman wrote:
>
>>The reason I never use raw strings is in the documentation, it is
>>because \ still has a special meaning, and the first several times I
>>felt the need for raw strings, it was for directory names that wanted to
>>end with \ and couldn't.
>Can you elaborate? I find it unlikely that I would ever want a docstring

I didn't mention docstring.  I just wanted a string with a path name 
ending in \.


You said you never used raw strings in the documentation. I read that as
doc strings. What sort of documentation are you writing that isn't a doc
string but is inside your .py files where the difference between raw and
regular strings is meaningful?


Windows users are used to seeing backslashes in paths, I don't care to 
be the one to explain why my program uses / and all the rest use \.


If you don't use raw strings for paths, you get to explain why your
program uses \\ and all the rest use \ *wink*

If they're Windows end users, they won't be reading your source code and
will never know how you represent hard-coded paths in the source code.

If they're Windows developers, they ought to be aware that the Windows
file system API allows / anywhere you can use \ and it is the
common convention in Python to use forward slashes.

I'm also curious why the string needs to *end* with a backslash. Both of
these are the same path:

 C:\foo\bar\baz\
 C:\foo\bar\baz



The only time it's required is for the root directory of a drive:

C:\
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GIPRSYAINB4NE4IORCYRTYN7TZWMCZ34/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-09 Thread Gregory P. Smith
On Fri, Aug 9, 2019 at 11:37 AM Eric V. Smith  wrote:

> On 8/9/2019 2:28 PM, Jonathan Goble wrote:
> > On Fri, Aug 9, 2019 at 12:34 PM Nick Coghlan  wrote:
> >> I find the "Our deprecation warnings were even less visible than
> >> normal" argument for extending the deprecation period compelling.
> > Outsider's 2 cents from reading this discussion (with no personal
> > experience with this warning):
> >
> > I am perplexed at the opinion, seemingly espoused by multiple people
> > in this thread, that because a major part of the problem is that the
> > warnings were not visible enough, somehow the proposed solution is
> > making them not visible enough again? It's too late, in my
> > understanding, in the 3.8 cycle to add a new feature like a change to
> > how these warnings are produced (it seems a significant change to the
> > .pyc structure is needed to emit them at runtime), so this supposed
> > "solution" is nothing but kicking the can down the road. When 3.9
> > rolls around, public exposure to the problem of invalid escape
> > sequences will still be approximately what it is now (because if
> > nobody saw the warnings in 3.7, they certainly won't see them in 3.8
> > with this "fix"), so you'll end up with the same complaints about
> > SyntaxWarning that started this discussion, end up back on
> > DeprecationWarning for 3.9 (hopefully with support for emitting them
> > at runtime instead of just compile-time), then have to wait until
> > 3.10/4.0 for SyntaxWarning and eventually the next version to actually
> > make them errors.
>
> Yes, I think that's the idea: Deprecation warning in 3.9, but more
> visible that what 3.7 has. That is, not just at compile time but at run
> time. What's required to make that happen is an open question.
>

i've lost track of who suggested what in this thread, but yes, that concept
has been rolling over in my mind as a potentially good idea after someone
suggested it.  Compile time warnings should turn into bytecode for a
warnings.warn call in the generated pyc.  I haven't spent time trying to
reason if that actually addresses the real issues we're having moving
forward with a syntax warning change though.  A reasonable feature to ask
for as a feature in 3.9 or later perhaps.

-gps
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BR7T76SXANRAGJ3QOMWZUEGRVPVP/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-09 Thread Gregory P. Smith
On Fri, Aug 9, 2019 at 8:43 AM Guido van Rossum  wrote:

> This discussion looks like there's no end in sight. Maybe the Steering
> Council should take a vote?
>

I've merged the PR reverting the behavior in 3.8 and am doing the same in
the master branch.

The sheer volume of email this is generating shows that we're not ready to
do this to our users.

Recall the nightmare caused by md5.py and sha.py DeprecationWarning's in
2.5...  this would be similar.

We need owners of code to see the problems, not end users of other peoples
code.

FWIW, lest people think I don't like this change and just pushed the revert
buttons as a result, wrong.  I agree with the ultimate SyntaxError and
believe we should move the language there (it is better for long term code
quality).  But it needs to be done in a way that disrupts the *right*
people in the process, not disrupting an exponentially higher number of
users of other peoples code.

If the steering council does anything it should be deciding if we're still
going to do this at all and, if so, planning how we do it without repeating
past mistakes.

-gps
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/W4BUFLMDX5FAFOVLYP4C2LQ2HOTJZEZX/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-09 Thread Steven D'Aprano
I'm not trying to be confrontational, I'm trying to understand your 
use-case(s) and see if it would be broken by the planned change to 
string escapes.


On Fri, Aug 09, 2019 at 03:18:29PM -0700, Glenn Linderman wrote:
> On 8/9/2019 2:53 PM, Steven D'Aprano wrote:
> >On Fri, Aug 09, 2019 at 01:12:59PM -0700, Glenn Linderman wrote:
> >
> >>The reason I never use raw strings is in the documentation, it is
> >>because \ still has a special meaning, and the first several times I
> >>felt the need for raw strings, it was for directory names that wanted to
> >>end with \ and couldn't.
> >Can you elaborate? I find it unlikely that I would ever want a docstring
> 
> I didn't mention docstring.  I just wanted a string with a path name 
> ending in \.

You said you never used raw strings in the documentation. I read that as 
doc strings. What sort of documentation are you writing that isn't a doc 
string but is inside your .py files where the difference between raw and 
regular strings is meaningful?


> Windows users are used to seeing backslashes in paths, I don't care to 
> be the one to explain why my program uses / and all the rest use \.

If you don't use raw strings for paths, you get to explain why your 
program uses \\ and all the rest use \ *wink*

If they're Windows end users, they won't be reading your source code and 
will never know how you represent hard-coded paths in the source code.

If they're Windows developers, they ought to be aware that the Windows 
file system API allows / anywhere you can use \ and it is the 
common convention in Python to use forward slashes.

I'm also curious why the string needs to *end* with a backslash. Both of 
these are the same path:

C:\foo\bar\baz\
C:\foo\bar\baz


-- 
Steven
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UT4WDQRJ5U5TA5YYHOM4RRDZV6KEC347/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-09 Thread Glenn Linderman

On 8/9/2019 2:53 PM, Steven D'Aprano wrote:

On Fri, Aug 09, 2019 at 01:12:59PM -0700, Glenn Linderman wrote:


The reason I never use raw strings is in the documentation, it is
because \ still has a special meaning, and the first several times I
felt the need for raw strings, it was for directory names that wanted to
end with \ and couldn't.

Can you elaborate? I find it unlikely that I would ever want a docstring


I didn't mention docstring.  I just wanted a string with a path name 
ending in \.



that ends with a backslash:

 def func():
 r"""Documentation goes here...
 more documentation...
 ending with a Windows path that needs a trailing backslash
 like this C:\directory\"""

That seems horribly contrived. Why use backslashes in the path when the
strong recommendation is to use forward slashes?


Windows users are used to seeing backslashes in paths, I don't care to 
be the one to explain why my program uses / and all the rest use \.



And why not solve the problem by simply moving the closing quotes to the
next line, as PEP 8 recommends?

 r"""Documentation ...
 C:\directory\
 """


This isn't my problem, I wasn't using docstrings, and including a 
newline in a path name doesn't work.  I suppose one could "solve" the 
problem by using


"c:\directory\ "[ :-1]

but that is just as annoying as

"c:\\directory\\"

and back when I discovered the problem, I was still learning Python, and 
didn't think of the above solution either.





[...]

Even in a raw literal, quotes can be escaped with a backslash

Indeed, they're not so much "raw" strings as only slightly blanched
strings.




___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/EHE2CNRDGS6AF6GYO4DX7UNNE24JH6CG/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-09 Thread Steven D'Aprano
On Fri, Aug 09, 2019 at 01:12:59PM -0700, Glenn Linderman wrote:

> The reason I never use raw strings is in the documentation, it is 
> because \ still has a special meaning, and the first several times I 
> felt the need for raw strings, it was for directory names that wanted to 
> end with \ and couldn't.

Can you elaborate? I find it unlikely that I would ever want a docstring 
that ends with a backslash:

def func():
r"""Documentation goes here...
more documentation...
ending with a Windows path that needs a trailing backslash
like this C:\directory\"""

That seems horribly contrived. Why use backslashes in the path when the 
strong recommendation is to use forward slashes?

And why not solve the problem by simply moving the closing quotes to the 
next line, as PEP 8 recommends?

r"""Documentation ...
C:\directory\
"""


[...]
> >Even in a raw literal, quotes can be escaped with a backslash

Indeed, they're not so much "raw" strings as only slightly blanched 
strings.


-- 
Steven
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/Z354BLWONCWMUMFULE64MWUK4TA6PMK2/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-09 Thread brian . skinn
Nathaniel Smith wrote:
> Unfortunately, their solution isn't a pytest incantation, it's a
> separate 'compileall' invocation they run on their source tree. I'm
> not sure how you'd convert this into a pytest feature, because I don't
> think pytest always know which parts of your code are your code versus
> which parts are supporting libraries.
> -n

Ahh, did not appreciate this. :-( Nevermind, then!
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VFFV7MJUZKMSD6FS3OONSEN5XLBOLT5R/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-09 Thread Nathaniel Smith
On Fri, Aug 9, 2019 at 12:07 PM  wrote:
>
> Eric V. Smith wrote:
> >  Hopefully the warnings in 3.9 would be more visible that what we saw in
> > 3.7, so that library authors can take notice and do something about it
> > before 3.10 rolls around.
> > Eric
>
> Apologies for the ~double-post on the thread, but: the SymPy team has figured 
> out the right pytest incantation to expose these warnings. Given the 
> extensive adoption of pytest, perhaps it would be good to combine (1) a FR on 
> pytest to add a convenience flag enabling this mix of options with (2) an 
> aggressive "marketing push", encouraging library maintainers to add it to 
> their testing/CI.

Unfortunately, their solution isn't a pytest incantation, it's a
separate 'compileall' invocation they run on their source tree. I'm
not sure how you'd convert this into a pytest feature, because I don't
think pytest always know which parts of your code are your code versus
which parts are supporting libraries.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/H36DMKUODHOQOYIIZCKW6LYKSGJLXTT4/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-09 Thread Glenn Linderman

On 8/9/2019 9:08 AM, Nick Coghlan wrote:

On Sat, 10 Aug 2019 at 01:44, Guido van Rossum  wrote:

This discussion looks like there's no end in sight. Maybe the Steering Council 
should take a vote?

I find the "Our deprecation warnings were even less visible than
normal" argument for extending the deprecation period compelling.

I also think the UX of the warning itself could be reviewed to provide
a more explicit nudge towards using raw strings when folks want to
allow arbitrary embedded backslashes. Consider:

 SyntaxWarning: invalid escape sequence \,

vs something like:

 SyntaxWarning: invalid escape sequence \, (Note: adding the raw
string literal prefix, r, will accept all non-trailing backslashes)

After all, the habit we're trying to encourage is "If I want to
include backslashes without escaping them all, I should use a raw
string", not "I should memorize the entire set of valid escape
sequences" or even "I should always escape backslashes".

Cheers,
Nick.

The reason I never use raw strings is in the documentation, it is 
because \ still has a special meaning, and the first several times I 
felt the need for raw strings, it was for directory names that wanted to 
end with \ and couldn't. Quoted below. Also relevant to the discussion 
is the "benefit" of leaving the backslash in the result of an illegal 
escape, which no one has mentioned in this huge thread.


Unlike Standard C, all unrecognized escape sequences are left in the 
string unchanged, i.e., /the backslash is left in the result/. (This 
behavior is useful when debugging: if an escape sequence is mistyped, 
the resulting output is more easily recognized as broken.) It is also 
important to note that the escape sequences only recognized in string 
literals fall into the category of unrecognized escapes for bytes 
literals.


Changed in version 3.6: Unrecognized escape sequences produce a
DeprecationWarning. In some future version of Python they will be
a SyntaxError.

Even in a raw literal, quotes can be escaped with a backslash, but the 
backslash remains in the result; for example, |r"\""| is a valid 
string literal consisting of two characters: a backslash and a double 
quote; |r"\"| is not a valid string literal (even a raw string cannot 
end in an odd number of backslashes). Specifically, /a raw literal 
cannot end in a single backslash/ (since the backslash would escape 
the following quote character). Note also that a single backslash 
followed by a newline is interpreted as those two characters as part 
of the literal, /not/ as a line continuation.




___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HKLW5VBZK46TOP6WURFH767YCHRFOYNN/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-09 Thread brian . skinn
Eric V. Smith wrote:
>  Hopefully the warnings in 3.9 would be more visible that what we saw in 
> 3.7, so that library authors can take notice and do something about it 
> before 3.10 rolls around.
> Eric

Apologies for the ~double-post on the thread, but: the SymPy team has figured 
out the right pytest incantation to expose these warnings. Given the extensive 
adoption of pytest, perhaps it would be good to combine (1) a FR on pytest to 
add a convenience flag enabling this mix of options with (2) an aggressive 
"marketing push", encouraging library maintainers to add it to their testing/CI.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/S2464WJ3QCDE4CBM6AWITHMFCISA6O75/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-09 Thread Jonathan Goble
On Fri, Aug 9, 2019 at 2:36 PM Eric V. Smith  wrote:
>
> On 8/9/2019 2:28 PM, Jonathan Goble wrote:
> > On Fri, Aug 9, 2019 at 12:34 PM Nick Coghlan  wrote:
> >> I find the "Our deprecation warnings were even less visible than
> >> normal" argument for extending the deprecation period compelling.
> > Outsider's 2 cents from reading this discussion (with no personal
> > experience with this warning):
> >
> > I am perplexed at the opinion, seemingly espoused by multiple people
> > in this thread, that because a major part of the problem is that the
> > warnings were not visible enough, somehow the proposed solution is
> > making them not visible enough again? It's too late, in my
> > understanding, in the 3.8 cycle to add a new feature like a change to
> > how these warnings are produced (it seems a significant change to the
> > .pyc structure is needed to emit them at runtime), so this supposed
> > "solution" is nothing but kicking the can down the road. When 3.9
> > rolls around, public exposure to the problem of invalid escape
> > sequences will still be approximately what it is now (because if
> > nobody saw the warnings in 3.7, they certainly won't see them in 3.8
> > with this "fix"), so you'll end up with the same complaints about
> > SyntaxWarning that started this discussion, end up back on
> > DeprecationWarning for 3.9 (hopefully with support for emitting them
> > at runtime instead of just compile-time), then have to wait until
> > 3.10/4.0 for SyntaxWarning and eventually the next version to actually
> > make them errors.
>
> Yes, I think that's the idea: Deprecation warning in 3.9, but more
> visible that what 3.7 has. That is, not just at compile time but at run
> time. What's required to make that happen is an open question.
>
> > It seems to me, in my humble but uneducated opinion, that if people
> > are not seeing the warnings, then continuing to give them warnings
> > they won't see isn't a solution to anything. Put the warning front and
> > center. The argument of third-party packages will always be an issue,
> > even if we wait ten years. So put these warnings front and center now
> > so package and code maintainers actually see it, and I'll bet the
> > problematic escape sequences get fixed rather quickly.
> >
> > What am I missing here?
>
> Hopefully the warnings in 3.9 would be more visible that what we saw in
> 3.7, so that library authors can take notice and do something about it
> before 3.10 rolls around.

OK, so I'm at least understanding the plan correctly. I just don't get
the idea of kicking the can down the road on the hope that in 3.9
people will see the warning (knowing that you are still using a
warning that is disabled by default and thus has a high chance of not
being seen until 3.10), when we already have the ability to push out a
visible-by-default warning now in 3.8 and get people to take notice
two whole feature releases (= about 3 years) earlier.

The SyntaxWarning disruption (or SyntaxError disruption) has to happen
eventually, and while I support the idea of making compile-time
DeprecationWarnings be emitted at runtime, I really don't think that a
disabled-by-default warning is going to change a whole lot. Sure, the
major packages will likely see it and update their code, but lots of
smaller specialty packages and independent developers won't see it in
3.9. The bulk of the change isn't going to happen until we go to
SyntaxWarning, so why not just get it over with instead of dragging it
out for three years?
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CWM2KO5IA24UCBSAYJP735EYKXIRRQRG/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-09 Thread Eric V. Smith

On 8/9/2019 2:28 PM, Jonathan Goble wrote:

On Fri, Aug 9, 2019 at 12:34 PM Nick Coghlan  wrote:

I find the "Our deprecation warnings were even less visible than
normal" argument for extending the deprecation period compelling.

Outsider's 2 cents from reading this discussion (with no personal
experience with this warning):

I am perplexed at the opinion, seemingly espoused by multiple people
in this thread, that because a major part of the problem is that the
warnings were not visible enough, somehow the proposed solution is
making them not visible enough again? It's too late, in my
understanding, in the 3.8 cycle to add a new feature like a change to
how these warnings are produced (it seems a significant change to the
.pyc structure is needed to emit them at runtime), so this supposed
"solution" is nothing but kicking the can down the road. When 3.9
rolls around, public exposure to the problem of invalid escape
sequences will still be approximately what it is now (because if
nobody saw the warnings in 3.7, they certainly won't see them in 3.8
with this "fix"), so you'll end up with the same complaints about
SyntaxWarning that started this discussion, end up back on
DeprecationWarning for 3.9 (hopefully with support for emitting them
at runtime instead of just compile-time), then have to wait until
3.10/4.0 for SyntaxWarning and eventually the next version to actually
make them errors.


Yes, I think that's the idea: Deprecation warning in 3.9, but more 
visible that what 3.7 has. That is, not just at compile time but at run 
time. What's required to make that happen is an open question.



It seems to me, in my humble but uneducated opinion, that if people
are not seeing the warnings, then continuing to give them warnings
they won't see isn't a solution to anything. Put the warning front and
center. The argument of third-party packages will always be an issue,
even if we wait ten years. So put these warnings front and center now
so package and code maintainers actually see it, and I'll bet the
problematic escape sequences get fixed rather quickly.

What am I missing here?


Hopefully the warnings in 3.9 would be more visible that what we saw in 
3.7, so that library authors can take notice and do something about it 
before 3.10 rolls around.


Eric
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GGZY7B2WFHVXRQ7NVTHGC2F4L5RJIKDI/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-09 Thread Jonathan Goble
On Fri, Aug 9, 2019 at 12:34 PM Nick Coghlan  wrote:
>
> I find the "Our deprecation warnings were even less visible than
> normal" argument for extending the deprecation period compelling.

Outsider's 2 cents from reading this discussion (with no personal
experience with this warning):

I am perplexed at the opinion, seemingly espoused by multiple people
in this thread, that because a major part of the problem is that the
warnings were not visible enough, somehow the proposed solution is
making them not visible enough again? It's too late, in my
understanding, in the 3.8 cycle to add a new feature like a change to
how these warnings are produced (it seems a significant change to the
.pyc structure is needed to emit them at runtime), so this supposed
"solution" is nothing but kicking the can down the road. When 3.9
rolls around, public exposure to the problem of invalid escape
sequences will still be approximately what it is now (because if
nobody saw the warnings in 3.7, they certainly won't see them in 3.8
with this "fix"), so you'll end up with the same complaints about
SyntaxWarning that started this discussion, end up back on
DeprecationWarning for 3.9 (hopefully with support for emitting them
at runtime instead of just compile-time), then have to wait until
3.10/4.0 for SyntaxWarning and eventually the next version to actually
make them errors.

It seems to me, in my humble but uneducated opinion, that if people
are not seeing the warnings, then continuing to give them warnings
they won't see isn't a solution to anything. Put the warning front and
center. The argument of third-party packages will always be an issue,
even if we wait ten years. So put these warnings front and center now
so package and code maintainers actually see it, and I'll bet the
problematic escape sequences get fixed rather quickly.

What am I missing here?
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/6ZBX2PULRGIRUBQ735ONGV2RZU2LP3WQ/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-09 Thread Paul Moore
On Fri, 9 Aug 2019 at 17:55, Steve Dower  wrote:
> > * change the SyntaxWarning into a default-silenced one that fires every 
> > time a .pyc is loaded (this is the hard part, but it's doable)
> > * change pathlib.PureWindowsPath, os.fsencode and os.fsdecode to explicitly 
> > warn when the path contains control characters
> > * change the PyErr_SetExcFromWindowsErrWithFilenameObjects function to 
> > append (or chain) an extra message when either of the filenames contains 
> > control characters (or change OSError to do it, or the default 
> > sys.excepthook)

The second and third art of this seem like they are both independent
of the first, and useful improvements in their own right.

Paul
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/6L65KDXMRTTLHX7HWAU4WLRMHEH7GXFA/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-09 Thread Steve Dower

On 09Aug2019 0905, Serhiy Storchaka wrote:

09.08.19 18:30, Guido van Rossum пише:
This discussion looks like there's no end in sight. Maybe the Steering 
Council should take a vote?


Possible options:

1. SyntaxWarning in 3.8+ (the current status).
2. DeprecationWarning in 3.8, SyntaxWarning in 3.9+ (revert changes in 
3.8 only).

3. DeprecationWarning in 3.8 and 3.9 (revert changes in master and 3.8).
4. No warnings at all.


I also posted another possible option that helps solve the real problem 
faced by users, and not just the "we want to have a warning" problem 
that is purely ours.



* change the SyntaxWarning into a default-silenced one that fires every time a 
.pyc is loaded (this is the hard part, but it's doable)
* change pathlib.PureWindowsPath, os.fsencode and os.fsdecode to explicitly 
warn when the path contains control characters
* change the PyErr_SetExcFromWindowsErrWithFilenameObjects function to append (or chain) an extra message when either of the filenames contains control characters (or change OSError to do it, or the default sys.excepthook) 


Cheers,
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GZJPZ55OR2CIERO5Q4ETPZPAQZSFAEDD/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-09 Thread Nick Coghlan
On Sat, 10 Aug 2019 at 01:44, Guido van Rossum  wrote:
>
> This discussion looks like there's no end in sight. Maybe the Steering 
> Council should take a vote?

I find the "Our deprecation warnings were even less visible than
normal" argument for extending the deprecation period compelling.

I also think the UX of the warning itself could be reviewed to provide
a more explicit nudge towards using raw strings when folks want to
allow arbitrary embedded backslashes. Consider:

SyntaxWarning: invalid escape sequence \,

vs something like:

SyntaxWarning: invalid escape sequence \, (Note: adding the raw
string literal prefix, r, will accept all non-trailing backslashes)

After all, the habit we're trying to encourage is "If I want to
include backslashes without escaping them all, I should use a raw
string", not "I should memorize the entire set of valid escape
sequences" or even "I should always escape backslashes".

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/N7Q4R3GX5RBF3FPGWMMKWYB4LOI7GVOC/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-09 Thread Serhiy Storchaka

09.08.19 18:30, Guido van Rossum пише:
This discussion looks like there's no end in sight. Maybe the Steering 
Council should take a vote?


Possible options:

1. SyntaxWarning in 3.8+ (the current status).
2. DeprecationWarning in 3.8, SyntaxWarning in 3.9+ (revert changes in 
3.8 only).

3. DeprecationWarning in 3.8 and 3.9 (revert changes in master and 3.8).
4. No warnings at all.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/TRZHNEITUZTDEEHSFWV5SUEXNRHTU3KQ/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-09 Thread Guido van Rossum
This discussion looks like there's no end in sight. Maybe the Steering
Council should take a vote?

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him/his **(why is my pronoun here?)*

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5BYIHT2BV7TPDHP6F5W44K4JKN5PHQQ3/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-09 Thread brian . skinn
> This whole thread would be an excellent justification for following 3.9
> with 4.0. It's as near as we ever want to get to a breaking change, and a
> major version number would indicate the need to review. If increasing
> strictness of escape code interpretation in string literals is the only
> incompatibility there would surely be general delight.
> 
> Kind regards,
> Steve Holden

I rather doubt that allowing breaking changes into a Python 4.0 would end up 
with this as the only proposed incompatibility. Once word got out, a flood of 
incompat requests would probably get raised. I personally have a change I'd 
like made to doctest (https://bugs.python.org/issue36714), and I know of 
another in argparse (https://bugs.python.org/issue33109) that I'm personally 
neutral on but that others have stronger feelings about.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SEBJF7C7RRG3Q3MFD5D6CTOFZUX7DNSE/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-09 Thread Chris Angelico
On Fri, Aug 9, 2019 at 11:22 PM Steven D'Aprano  wrote:
>
> And this change won't fix that, because *good* paths that currently work
> today will fail in the future, but *bad* paths that silently do the
> wrong thing will continue to silently do the wrong thing.

Except that many paths can be both "good" and "bad", because paths
have multiple components. So the warning has a VERY high probability
of happening.

But I've given up on this debate. No more posts from me. Some things
aren't worth fighting for. With the number of words posted in this
thread saying "we need convenience, not correctness", I'm done
arguing.

ChrisA
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/K4T7V5Z5GSGGK7HO73ZMFGTTAGMKHDE3/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-09 Thread Steven D'Aprano
On Wed, Aug 07, 2019 at 07:47:45PM +1000, Chris Angelico wrote:
> On Wed, Aug 7, 2019 at 7:33 PM Steven D'Aprano  wrote:
> > What's the rush? Let's be objective here: what benefit are we going to
> > get from this change? Is there anyone hanging out desperately for "\d"
> > and "\-" to become SyntaxErrors, so they can... do what?
> 
> So that problems can start to be detected. Time and again, Python
> users on Windows get EXTREMELY confused by the way their code worked
> perfectly with one path, then bizarrely fails with another. That is a
> very real problem, and the problem is that it appeared to work when
> actually it was wrong.

And this change won't fix that, because *good* paths that currently work 
today will fail in the future, but *bad* paths that silently do the 
wrong thing will continue to silently do the wrong thing.


py> filename = "location\data"  # will work correctly
:1: SyntaxWarning: invalid escape sequence \d

py> filename = "location\temp"  # doesn't work as expected, but no error
py>


Effectively, we are hoping that Windows users will infer from the 
failure of "\d" (say) that they shouldn't use "\t" even though it 
doesn't raise. Perhaps some of them will, but I maintain we're talking 
about a small, incremental improvement, not something that will once and 
for all fix the problem.

I don't think this is a benefit for users of any operating system except 
Windows users. For Linux, Unix, Mac users, one could argue strongly that 
we're making the string escape experience a tiny bit *worse*, not 
better. Raymond's example of ASCII art for example.

I think the subset of users that this will help is quite small:

- users on Windows;

- who haven't read or paid attention to the innumerable recommendations
  on the web and the documentation that they always use forwards slashes
  in paths;

- who happen to use an escape like \d rather than \t;

- and will read and understand the eventual SyntaxWarning/Error;

- and infer from that error that they should change their path to use
  forward slashes instead of backslashes;

- and all this happens *before* they get bitten by the \t problem and
  they learn the hard way not to use backslashes in paths.

I'm not saying this isn't worth doing. I'm saying it's a small benefit 
that *right now* is a lot less than the cost to library authors and users.


> Python has a history of fixing these problems. It used to be that
> b"\x61\x62\x63\x64" was equal to u"abcd", but now Python sees these as
> fundamentally different.

Yes, and we fixed that over a 10+ year period involving no fewer than 
three full releases in the Python 2.x series and eight full releases in 
the Python 3.x series, and the transition period is not over yet since 
2.7 is not yet EOLed.



> Data-dependent bugs caused by a syntactic
> oddity are a language flaw that needs to be fixed.

There is always a tradeoff between the severity of the flaw and how much 
pain we are willing to accept to fix it. I think Raymond has made a good 
case that in this instance, the pain of fixing it *now* is greater than 
the benefit.

(I don't think he has made the case to reverse the depreciation 
altogether.)

If the benefit versus pain never moves into the black, then we should 
keep the status quo indefinitely, like any other language wart or 
misfeature we're stuck with due to backwards compatibility.

("Although never is often better than *right* now.")

But having said that, I'm confident that given an improved deprecation 
process that makes it easier for library authors to see the warning 
before end-users, we will be able to move forward in a release or two.


> > Because our processes don't work the way we assumed, it turns out that
> > in practice we haven't given developers the deprecation period we
> > thought we had. Read Nathaniel's post, if you haven't already done so:
> >
> > https://mail.python.org/archives/list/python-dev@python.org/message/E7QCC74OBYEY3PVLNQG2ZAVRO653LD5K/
> >
> > He makes a compelling case that while we might have had the promised
> > deprecation period by the letter of the law, in practice most developers
> > will have never seen it, and we will be breaking the spirit of the
> > promise if we continue with the unmodified plan.
> 
> Yes, that's a fair complaint. But merely pushing the deprecation back
> by a version is not solving it. There has to be SOMETHING done
> differently.

"We must do SOMETHING!!! This is something, therefore we must do it!!!"

I agree that we ought to fix the problem with the deprecation warnings.

What I don't agree with is the demand that unless I can give a fix for 
the deprecation warning issue *right now* we must stay the course no 
matter how annoying and painful it is for users and library authors.


> > And yet here we are rushing through a breaking change in an accelerated
> > manner, for a change of marginal benefit.
> 
> It's not a marginal benefit. For people who try to teach Python on
> multiple operating systems, this 

[Python-Dev] Re: What to do about invalid escape sequences

2019-08-08 Thread Stephen J. Turnbull
Steve Holden writes:

 > This whole thread would be an excellent justification for following 3.9
 > with 4.0. It's as near as we ever want to get to a breaking change, and a
 > major version number would indicate the need to review. If increasing
 > strictness of escape code interpretation in string literals is the only
 > incompatibility there would surely be general delight.

This should be the first chapter in the Beautiful Version Numbering
book!  I love it!
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LY2RX4ROGH54IU57RO7Y2O6IDDV5LUBG/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-08 Thread Jim J. Jewett
FWIW, the web archive 
https://mail.python.org/archives/list/python-dev@python.org/thread/ZX2JLOZDOXWVBQLKE4UCVTU5JABPQSLB/
 does not seem to display the problems ... apparently the individual messages 
are not included in view source, and are cleaned up for chrome's inspect.  I'm 
not sure whether that counts as a bug in the archiving or not.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UYSBJFII467TKNA2SDYCJZUQFLCGEAKY/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-08 Thread Eric V. Smith

On 8/5/2019 4:30 PM, raymond.hettin...@gmail.com wrote:


Thanks for weighing in.  I think this is an important usability discussion.  
IMO it is the number one issue affecting the end user experience with this 
release.   If we could get more people to actively use the beta release, the 
issue would stand-out front and center.  But if people don't use the beta in 
earnest, we won't have confirmation until it is too late.

We really don't have to go this path.  Arguably, the implicit conversion of 
'\latex' to '\\latex' is a feature that has existed for three decades, and now 
we're deciding to turn it off to define existing practices as errors.  I don't 
think any commercial product manager would allow this to occur without a lot of 
end user testing.


As much as I'd love to force this change through [0], it really does 
seem like we're forcing it. Especially given Nathaniel's point about the 
discoverability problems with compile-time warnings, I think we should 
delay a visible warning about this. Possibly in 3.9 we can do something 
about making these warnings visible at run time, not just compile time. 
I had a similar problems with f-strings (can't recall the details now, 
since resolved), and the compile-time-only nature made it difficult to 
notice. I realize a run-time warning for this would require a fair bit 
of work that might not be worth it.


I think Raymond's point goes beyond this. I think he's proposing that we 
never make this change. I'm sympathetic to that, too. But the first step 
is to change 3.8's behavior to not make this visible. That is, we should 
restore the 3.7 warning behavior.


Eric

[0] And the real reason I'd like this is so we can add \e
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UADZYIYTPGNRELG477F3SSRB3K7R2J75/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-08 Thread Jeroen Demeyer
> When you take a text string and create a string literal to represent
> it, sometimes you have to modify it to become syntactically valid.

Even simpler: use r""" instead of """

The only case where that won't work is when you need actual escape
sequences. But I find this very rare in practice for triple-quoted
strings.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LV5STHINBEREK2Y43OQLFUOBQPN2AXZC/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-08 Thread Christian Tismer
Hey friends,

This is IMHO a great idea.
If a package claims to be Python 3.8 compatible, then it has to
be correct concerning invalid escapes.

A new pip version could perhaps even refuse packages with such
literals when it claims to be supporting Python 3.8 .

But how can it actually happen that a pre-3.8 package gets installed
when you install Python 3.8? Does pip allow installation without
a section that defines the allowed versions?

Ok, maybe packages are claimed for Python 3.8 and not further checked.

But let's assume the third-party things that Raymond sees do _not_
come from pip, but elsewhere. Pre-existing stuff that is somehow copied
into the newer Python version? Sure, quite possible!

But then it is quite likely that those third-party things still
have their creation date from pre-3.8 time.
What about the simple heuristic that a Python module with a creation
date earlier than xxx does simply not issue the annoying warning?

Maybe that already cures the disease in enough cases?

just a wild idea - \leave \old \code \untouched -- ciao - \Chris


On 06.08.19 18:59, Neil Schemenauer wrote:
> 
> Making it an error so soon would be mistake, IMHO.  That will break
> currently working code for small benefit.  When Python was a young
> language with a few thousand users, it was easier to make these
> kinds of changes.  Now, we should be much more conservative and give
> people a long time and a lot of warning.  Ideally, we should provide
> tools to fix code if possible.
> 
> Could PyPI and pip gain the ability to warn and even fix these
> issues?  Having a warning from pip at install time could be better
> than a warning at import time.  If linting was built into PyPI, we
> could even do a census to see how many packages would be affected by
> turning it into an error.
> 
> On 2019-08-05, raymond.hettin...@gmail.com wrote:
>> P.S. In the world of C compilers, I suspect that if the relatively
>> new compiler warnings were treated as errors, the breakage would
>> be widespread. Presumably that's why they haven't gone down this
>> road.
> 
> The comparision with C compilers is relevant.  C and C++ represent a
> fairly extreme position on not breaking working code.   E.g. K & R
> style functional declarations were supported for decades.  I don't
> think we need to go quite that far but also one or two releases is
> not enough time.
> 
> Regards,
> 
>   Neil
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/V2EDFDJGXRIDMKJU3FKIWC2NDLMUZA2Y/
> 


-- 
Christian Tismer :^)   tis...@stackless.com
Software Consulting  : http://www.stackless.com/
Karl-Liebknecht-Str. 121 : https://github.com/PySide
14482 Potsdam: GPG key -> 0xFB7BEE0E
phone +49 173 24 18 776  fax +49 (30) 700143-0023
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XVJCYXDZ7VPMMCTP2BPNAJ3OO7S4II4V/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-08 Thread Dima Tisnek
These two ought to be converted to raw strings, shouldn't they?

On Thu, 8 Aug 2019 at 08:04,  wrote:
>
> For me, these warnings are continuing to arise almost daily.  See two recent 
> examples below.  In both cases, the code previously had always worked without 
> complaint.
>
> - Example from yesterday's class 
>
> ''' How old-style formatting works with positional placeholders
>
> print('The answer is %d today, but was %d yesterday' % (new, old))
>  \o
>   \o
> '''
>
> SyntaxWarning: invalid escape sequence \-
>
> - Example from today's class 
>
> # Cut and pasted from:
> # https://en.wikipedia.org/wiki/VCard#vCard_2.1
> vcard = '''
> BEGIN:VCARD
> VERSION:2.1
> N:Gump;Forrest;;Mr.
> FN:Forrest Gump
> ORG:Bubba Gump Shrimp Co.
> TITLE:Shrimp Man
> PHOTO;GIF:http://www.example.com/dir_photos/my_photo.gif
> TEL;WORK;VOICE:(111) 555-1212
> TEL;HOME;VOICE:(404) 555-1212
> ADR;WORK;PREF:;;100 Waters Edge;Baytown;LA;30314;United States of America
> LABEL;WORK;PREF;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:100 Waters Edge=0D=
>  =0ABaytown\, LA 30314=0D=0AUnited States of America
> ADR;HOME:;;42 Plantation St.;Baytown;LA;30314;United States of America
> LABEL;HOME;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:42 Plantation St.=0D=0A=
>  Baytown, LA 30314=0D=0AUnited States of America
> EMAIL:forrestg...@example.com
> REV:20080424T195243Z
> END:VCARD
> '''
>
> SyntaxWarning: invalid escape sequence \,
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/OYGRL5AWSJZ34MDLGIFTWJXQPLNSK23S/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/P5YTWGKVSR5EFTHHUKOXW32CBEUYIRW2/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-08 Thread Dima Tisnek
I feel this is one of the cases, where we're expecting early adopters
to proactively post pull requests against affected libraries. Failing
that opening issues against affected libraries.

I was ready to do just that, but alas didn't even have to!
Matt's analysis shows that it's now too hard.

What was hard for me were the rules. In fact, not being up to date, I
couldn't even find the PEP that specified the change.

What the Python devs could do is to guide users on how to update existing code.
Something like python3.8 -c 'print(repr("\b\l\a\h"))' but with sensible output.
And instruction for those who support both py3 and py3 from the same codebase.

I could hope for a feature in psf/black, but maybe that's not for everyone.
Just my 2c :)

On Mon, 5 Aug 2019 at 13:30,  wrote:
>
> We should revisit what we want to do (if anything) about invalid escape 
> sequences.
>
> For Python 3.8, the DeprecationWarning was converted to a SyntaxWarning which 
> is visible by default.  The intention is to make it a SyntaxError in Python 
> 3.9.
>
> This once seemed like a reasonable and innocuous idea to me; however, I've 
> been using the 3.8 beta heavily for a month and no longer think it is a good 
> idea.  The warning crops up frequently, often due to third-party packages 
> (such as docutils and bottle) that users can't easily do anything about.  And 
> during live demos and student workshops, it is especially distracting.
>
> I now think our cure is worse than the disease.  If code currently has a 
> non-raw string with '\latex', do we really need Python to yelp about it (for 
> 3.8) or reject it entirely (for 3.9)?   If someone can't remember exactly 
> which special characters need to be escaped, do we really need to stop them 
> in their tracks during a data analysis session?  Do we really need to reject 
> ASCII art in docstrings: ` \---> special case'?
>
> IIRC, the original problem to be solved was false positives rather than false 
> negatives:  filename = '..\training\new_memo.doc'.  The warnings and errors 
> don't do (and likely can't do) anything about this.
>
> If Python 3.8 goes out as-is, we may be punching our users in the nose and 
> getting almost no gain from it.  ISTM this is a job best left for linters.  
> For a very long time, Python has been accepting the likes of 'more \latex 
> markup' and has been silently converting it to 'more \\latex markup'.  I now 
> think it should remain that way.  This issue in the 3.8 beta releases has 
> been an almost daily annoyance for me and my customers. Depending on how you 
> use Python, this may not affect you or it may arise multiple times per day.
>
>
> Raymond
>
> P.S.  Before responding, it would be a useful exercise to think for a moment 
> about whether you remember exactly which characters must be escaped or 
> whether you habitually put in an extra backslash when you aren't sure.  Then 
> see:  https://bugs.python.org/issue32912
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/ZX2JLOZDOXWVBQLKE4UCVTU5JABPQSLB/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/F2ZIHAT2EIWM5IOJFP2THGUOSFZJ3Z2W/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-08 Thread Glenn Linderman

On 8/7/2019 6:13 PM, raymond.hettin...@gmail.com wrote:

This isn't about me.  As a heavy user of the 3.8 beta, I'm just the canary in 
the coal mine.
Are you, with an understanding of the issue, submitting bug reports on 
the issues you find, thus helping to alleviate the problem, and educate 
the package maintainers?


Or are you just carping here?

I'll apologize in advance for using the word "carping" if the answer to 
my first question is yes. :)


Glenn
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/N26WJ2BCYT7CPFRHZGLQKILDCCKDTV5N/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-07 Thread Serhiy Storchaka

08.08.19 07:55, Toshio Kuratomi пише:

Like the Ansible feature, though, the problem is that over time we've
discovered that it is hard to educate users about the exact
characteristic of the feature (\k == k but \n == newline;


No, \k == \\k. This differs from most other programming languages.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/7OTMWGJOMXT6F6NONVSL2WLFG3VPP4B6/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-07 Thread Toshio Kuratomi
On Mon, Aug 5, 2019 at 6:47 PM  wrote:
>
> I wish people with more product management experience would chime in; 
> otherwise, 3.8 is going to ship with an intentional hard-to-ignore annoyance 
> on the premise that we don't like the way people have been programming and 
> that they need to change their code even if it was working just fine.
>

I was resisting weighing in since I don't know the discussion around
deprecating this language feature in the first place (other than
what's given in this thread).  However, in the product I work on we
made a very similar change in our last release so I'll throw it out
there for people to take what they will from it.

We have a long standing feature which allows people to define groups
of hosts and give them a name.  In the past that name could include
dashes, dots, and other characters which are not legal as Python
identifiers.  When users use those group names in our "DSL" (not truly
a DSL but close enough), they can do it using either dictionary-lookup
syntax (groupvars['groupname']) or using dotted attribute notation
groupvars.groupname.  We also have a longstanding problem where users
will try to do something like groupvars.group-name using the
dotted attribute notation with group names that aren't proper python
identifiers.  This causes problems as the name then gets split on the
characters that aren't legal in identifiers and results in something
unexpected (undefined variable, an actual subtraction operation, etc).
In our last release we decided to deprecate and eventually make it
illegal to use non-python-identifiers for the group names.

At first, product management *did* let us get away with this.  But
after some time and usage of the pre-releases, they came to realize
that this was a major problem.  User's had gotten used to being able
to use these characters in their group names.  They had defined their
group names and gotten used to typing their group names and built up a
whole body of playbooks that used these group names

Product management still let us get away with this.. sort of. The
scope of the change was definitely modified.  Users were now allowed
to select whether invalid group names were disallowed (so they could
port their installations), allowed with a warning (presumably so they
could do work but also see that they were affected) or allow without a
warning (presumably because they knew not to use these group names
with dotted attribute notation) .  This feature was also no longer
allowed to be deprecated... We could have a warning that said "Don't
do this" but not remove the feature in the future.

Now... I said this was a config option So what we do have in the
release is that the config option allows but warns by default and *the
config option* has a deprecation warning.  You see... we're planning
on changing from warn by default now to disallowing by default in the
future so the deprecation is flagging the change in config value.

And you know what?  User's absolutely hate this.  They don't like the
warning.  They don't like the implication that they're doing something
wrong by using a long-standing feature.  They don't like that we're
going to change the default so that they're current group names will
break.  They dislike that it's being warned about because of
attribute-lookup-notation which they can just learn not to use with
their group names.  They dislike this so much that some of us have
talked about abandoning this idea... instead, having a public group
name that users use when they write in the "DSL" and an internal group
name that we use when evaluating the group names. Perhaps that works,
perhaps it doesn't, but I think that's where my story starts being
specific to our feature and no longer applicable to Python and escape
sequences

Now like I said, I don't know the discussions that lead to invalid
escape sequences being deprecated so I don't know whether there's more
compelling reasons for doing it but it seems to me that there's even
less to gain by doing this than what we did in Ansible.  The thing
Ansible is complaining about can do the wrong thing when used in
conjunction with certain other features of our "DSL".  The thing that
the python escape sequences is complaining about are never invalid (As
was pointed out, it's complaining when a sequence of two characters
will do what the user intended rather than complaining when a sequence
of two characters will do something that the user did not intend).
Like the Ansible feature, though, the problem is that over time we've
discovered that it is hard to educate users about the exact
characteristic of the feature (\k == k but \n == newline;
groupvars['group-name']  works but groupvars.group-name does not) so
we've both given up on continuing to educate the users in favor of
attempting to nanny the user into not using the feature.  That most
emphatically has not worked for us and has spent a bunch of goodwill
with our users but the python userbase is not 

[Python-Dev] Re: What to do about invalid escape sequences

2019-08-07 Thread raymond . hettinger
This isn't about me.  As a heavy user of the 3.8 beta, I'm just the canary in 
the coal mine.

After many encounters with these warnings, I'm starting to believe that 
Python's long-standing behavior was convenient for users.  Effectively, "\-" 
wasn't an error, it was just a way of writing "\-". For the most part, that 
worked out fine. Sure, we all seen interactive prompt errors from having \t in 
a pathname but not in production (likely because a FileNotFoundError would 
surface immediately).
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4YNZYCOBWGMLC6BDXQFJJWLXEK47I5PU/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-07 Thread MRAB

On 2019-08-07 23:43, Steve Holden wrote:
This whole thread would be an excellent justification for following 3.9 
with 4.0. It's as near as we ever want to get to a breaking change, and 
a major version number would indicate the need to review. If increasing 
strictness of escape code interpretation in string literals is the only 
incompatibility there would surely be general delight.



I can think of another possible one: import * requires __all__.

[snip]
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WQEHMFMR7IRWYDXDSCZUGJKGDI5HNEDK/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-07 Thread Chris Angelico
On Thu, Aug 8, 2019 at 8:58 AM  wrote:
>
> For me, these warnings are continuing to arise almost daily.  See two recent 
> examples below.  In both cases, the code previously had always worked without 
> complaint.
>
> - Example from yesterday's class 
>
> ''' How old-style formatting works with positional placeholders
>
> print('The answer is %d today, but was %d yesterday' % (new, old))
>  \o
>   \o
> '''
>
> SyntaxWarning: invalid escape sequence \-

I've no idea why this is even a string literal, but if it absolutely
has to be, then you could use a character other than backslash.

> - Example from today's class 
>
> # Cut and pasted from:
> # https://en.wikipedia.org/wiki/VCard#vCard_2.1
> vcard = '''
> ...
> LABEL;WORK;PREF;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:100 Waters Edge=0D=
>  =0ABaytown\, LA 30314=0D=0AUnited States of America
> ...
> '''
>
> SyntaxWarning: invalid escape sequence \,

When you take a text string and create a string literal to represent
it, sometimes you have to modify it to become syntactically valid.
This is exactly the sort of thing that SHOULD be being warned about,
because it's sometimes going to work and sometimes not, depending on
the exact data you're working with. Please don't teach people the
habit of pretending that the backslash isn't significant.

If the warning were changed to be silent for 3.8, what would you do
differently? How would having extra time to solve this problem help
you?

ChrisA
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WCNY3C7VBLCP5RDKKVMMEMN7R26GK2FI/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-07 Thread raymond . hettinger
For me, these warnings are continuing to arise almost daily.  See two recent 
examples below.  In both cases, the code previously had always worked without 
complaint.

- Example from yesterday's class 

''' How old-style formatting works with positional placeholders

print('The answer is %d today, but was %d yesterday' % (new, old))
 \o
  \o
'''
   
SyntaxWarning: invalid escape sequence \-

- Example from today's class 

# Cut and pasted from: 
# https://en.wikipedia.org/wiki/VCard#vCard_2.1
vcard = '''
BEGIN:VCARD
VERSION:2.1
N:Gump;Forrest;;Mr.
FN:Forrest Gump
ORG:Bubba Gump Shrimp Co.
TITLE:Shrimp Man
PHOTO;GIF:http://www.example.com/dir_photos/my_photo.gif
TEL;WORK;VOICE:(111) 555-1212
TEL;HOME;VOICE:(404) 555-1212
ADR;WORK;PREF:;;100 Waters Edge;Baytown;LA;30314;United States of America
LABEL;WORK;PREF;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:100 Waters Edge=0D=
 =0ABaytown\, LA 30314=0D=0AUnited States of America
ADR;HOME:;;42 Plantation St.;Baytown;LA;30314;United States of America
LABEL;HOME;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:42 Plantation St.=0D=0A=
 Baytown, LA 30314=0D=0AUnited States of America
EMAIL:forrestg...@example.com
REV:20080424T195243Z
END:VCARD
'''

SyntaxWarning: invalid escape sequence \,
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OYGRL5AWSJZ34MDLGIFTWJXQPLNSK23S/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-07 Thread Steve Holden
This whole thread would be an excellent justification for following 3.9
with 4.0. It's as near as we ever want to get to a breaking change, and a
major version number would indicate the need to review. If increasing
strictness of escape code interpretation in string literals is the only
incompatibility there would surely be general delight.

Kind regards,
Steve Holden


On Wed, Aug 7, 2019 at 8:19 PM eryk sun  wrote:

> On 8/7/19, Steve Dower  wrote:
> >
> > * change the PyErr_SetExcFromWindowsErrWithFilenameObjects function to
> > append (or chain) an extra message when either of the filenames contains
> c
> > control characters (or change OSError to do it, or the default
> > sys.excepthook)
>
> On a related note for Windows, if the error is specifically
> ERROR_INVALID_NAME, we could extend this to look for and warn about
> the five reserved wildcard characters (asterisk, question mark, double
> quote, less than, greater than), pipe, and colon. It's only sometimes
> the case for colon because it's allowed in device names and used as
> the name and type delimiter for stream names.
>
> Kernel object names don't reserve wildcard characters, pipe, and
> colon. So I wouldn't want anything but the control-character warning
> if it's say ERROR_FILE_NOT_FOUND. An example would be
> SharedMemory(name='Global\test'), or a similar error for registry key
> and value names such as OpenKey(hkey, 'spam\test'), that is if winreg
> were updated to include the name in the exception. Note that forward
> slash is just a name character in these cases, not a path separator,
> so we have to use backslash, even if just via replace('/', '\\').
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/UFMVFL4QDUXLZFBWVW4YLAKPHQ6LTPDK/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/KM2IVRWN5QPLCFHJ5FUWZ6XB7DW2VONS/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-07 Thread eryk sun
On 8/7/19, Steve Dower  wrote:
>
> * change the PyErr_SetExcFromWindowsErrWithFilenameObjects function to
> append (or chain) an extra message when either of the filenames contains c
> control characters (or change OSError to do it, or the default
> sys.excepthook)

On a related note for Windows, if the error is specifically
ERROR_INVALID_NAME, we could extend this to look for and warn about
the five reserved wildcard characters (asterisk, question mark, double
quote, less than, greater than), pipe, and colon. It's only sometimes
the case for colon because it's allowed in device names and used as
the name and type delimiter for stream names.

Kernel object names don't reserve wildcard characters, pipe, and
colon. So I wouldn't want anything but the control-character warning
if it's say ERROR_FILE_NOT_FOUND. An example would be
SharedMemory(name='Global\test'), or a similar error for registry key
and value names such as OpenKey(hkey, 'spam\test'), that is if winreg
were updated to include the name in the exception. Note that forward
slash is just a name character in these cases, not a path separator,
so we have to use backslash, even if just via replace('/', '\\').
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UFMVFL4QDUXLZFBWVW4YLAKPHQ6LTPDK/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-07 Thread brian . skinn
Steven D'Aprano wrote:

> Because our processes don't work the way we assumed, it turns out that 
> in practice we haven't given developers the deprecation period we 
> thought we had. Read Nathaniel's post, if you haven't already done so:
> https://mail.python.org/archives/list/python-dev@python.org/message/E7QCC74O...
> He makes a compelling case that while we might have had the promised 
> deprecation period by the letter of the law, in practice most developers 
> will have never seen it, and we will be breaking the spirit of the 
> promise if we continue with the unmodified plan.
> ...
> I'm sure that the affected devs will understand why it was their fault 
> they couldn't see the warnings, when even people from a first-class 
> library like SymPy took four iterations to do it right.
> > Currently it
> > requires some extra steps or flags, which are not well known. What
> > change are you proposing for 3.8 that will ensure that this actually
> > gets solved?
> > Absolutely nothing. I don't have to: we're an entire community, this 
> doesn't have to fall only on my shoulders. I'm not even the messenger: 
> that's Raymond. I'm just (partly) agreeing with him.
> Just because I don't have a solution for this problem doesn't mean the 
> problem doesn't exist.

As the SymPy team has figured out the right pytest incantation to expose these 
warnings, perhaps a feature request on pytest to encapsulate that mix of 
options into a single flag would be a good idea?
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WOJQTOXMYKHLQO4KICEIZH3PDEMQLMBL/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-07 Thread Steve Dower

On 07Aug2019 0247, Chris Angelico wrote:

On Wed, Aug 7, 2019 at 7:33 PM Steven D'Aprano  wrote:

What's the rush? Let's be objective here: what benefit are we going to
get from this change? Is there anyone hanging out desperately for "\d"
and "\-" to become SyntaxErrors, so they can... do what?


So that problems can start to be detected. Time and again, Python
users on Windows get EXTREMELY confused by the way their code worked
perfectly with one path, then bizarrely fails with another. That is a
very real problem, and the problem is that it appeared to work when
actually it was wrong.
[...]
If you can offer a better plan, then by all means, do so. But
deferring without a change is of no real value, and it means ANOTHER
eighteen months added onto the time before novice programmers get to
be told about string literal problems.


Allow me to offer one:

* change the SyntaxWarning into a default-silenced one that fires every 
time a .pyc is loaded (this is the hard part, but it's doable)
* change pathlib.PureWindowsPath, os.fsencode and os.fsdecode to 
explicitly warn when the path contains control characters
* change the PyErr_SetExcFromWindowsErrWithFilenameObjects function to 
append (or chain) an extra message when either of the filenames contains 
control characters (or change OSError to do it, or the default 
sys.excepthook)


I don't care whether the changes are applied to all platforms rather 
than just Windows, but since Windows developers hit the problem and 
(some) Linux developers like to use control characters in filenames, I 
can see a justification for only warning on Windows.


Long term we can still deprecate and eventually block unrecognized 
escape sequences, but the long standing behaviour can stand for a few 
more years without creating more harm.


Cheers,
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/2ICLCF5T53DBPVZPVHMT2XTXL64QF7WW/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-07 Thread Joao S. O. Bueno
For what I can see, the majority of new users in an interactive environment
seeing the warning will do so because the incorrect string will be
in _their_ code. The benefits are immediate, as people change to either
using raw-strings or using forward-slashes for file paths.

The examples in the beggining of this thread, where one changing
a file path to "C:\users" sudden have broken code speaks for themselves:
this is a _fix_ . Broken libraries will be fixed within weeks of a Py 3.8
release. People
will either be using an old install, with Python 3.7, or they keep
everything up to date,
 and for those after 2 months max, the library warnings will be all but
gone.

In the meantime, what is possible is to publicize more how to disable these
warnings on
end-users side, since we all agree that few people know how to that.

On Wed, 7 Aug 2019 at 06:51, Chris Angelico  wrote:

> On Wed, Aug 7, 2019 at 7:33 PM Steven D'Aprano 
> wrote:
> > What's the rush? Let's be objective here: what benefit are we going to
> > get from this change? Is there anyone hanging out desperately for "\d"
> > and "\-" to become SyntaxErrors, so they can... do what?
>
> So that problems can start to be detected. Time and again, Python
> users on Windows get EXTREMELY confused by the way their code worked
> perfectly with one path, then bizarrely fails with another. That is a
> very real problem, and the problem is that it appeared to work when
> actually it was wrong.
>
> Python has a history of fixing these problems. It used to be that
> b"\x61\x62\x63\x64" was equal to u"abcd", but now Python sees these as
> fundamentally different. Data-dependent bugs caused by a syntactic
> oddity are a language flaw that needs to be fixed.
>
> > Because our processes don't work the way we assumed, it turns out that
> > in practice we haven't given developers the deprecation period we
> > thought we had. Read Nathaniel's post, if you haven't already done so:
> >
> >
> https://mail.python.org/archives/list/python-dev@python.org/message/E7QCC74OBYEY3PVLNQG2ZAVRO653LD5K/
> >
> > He makes a compelling case that while we might have had the promised
> > deprecation period by the letter of the law, in practice most developers
> > will have never seen it, and we will be breaking the spirit of the
> > promise if we continue with the unmodified plan.
>
> Yes, that's a fair complaint. But merely pushing the deprecation back
> by a version is not solving it. There has to be SOMETHING done
> differently.
>
> > And yet here we are rushing through a breaking change in an accelerated
> > manner, for a change of marginal benefit.
>
> It's not a marginal benefit. For people who try to teach Python on
> multiple operating systems, this is a very very real benefit. Just
> because YOU don't see the benefit doesn't mean it isn't there.
>
> > > Otherwise, all you're doing is saying "I wish this
> > > problem would just go away".
> >
> > No, I'm saying we don't have to rush this into 3.8. Let's keep the
> > warning silent and push everything back a release.
> >
> > Now is better than never.
> > Although never is often better than *right* now.
>
> Not sure how the Zen supports what you're saying there, since you're
> specifically saying "not never, not now, just later". But what do you
> actually mean by not rushing this into 3.8?
>
> > Right now, we're looking at a seriously compromised user-experience for
> > 3.8. People are going to hate these warnings, many of them won't know
> > what to do with them and will be sure that Python is buggy, and for very
> > little benefit.
>
> Then the problem is that people blame Python for these warnings. That
> is a problem to be solved; we need people to understand that a warning
> emitted by a library is a *library bug* not a language flaw.
>
> > > Library authors can start _right now_ fixing their code so it's more
> > > 3.8 compatible.
> >
> > Provided that (1) they are aware that this is a problem that needs to be
> > fixed, and (2) they have the round tuits to actually fix it by 3.8.0.
> > Neither are guaranteed.
>
> (1) Yes it is, see above; (2) fair point, but this is restricted to
> string literals and can be detected simply by compiling the code, so
> it's a reasonably findable problem.
>
> > > ("More" because 3.8 doesn't actually break anything.)
> > > What is actually gained by waiting longer
> >
> > We gain the avoidance of a painful experience in 3.8 for a significant
> > number of users and third-party devs.
> >
> > The question we haven't had answered is what we gain by pushing through
> > with the original plan. Plenty of people have said "Let's just do it"
> > but as far as I can see not one has explained *why* we should put end-
> > users and library developers through this frustrating and annoying
> > rushed deprecation period.
>
> And unless you have a plan to do something different in 3.8 that
> ensures that library devs see the warnings, there's no justification
> for the delay. All you'll do is defer the 

[Python-Dev] Re: What to do about invalid escape sequences

2019-08-07 Thread Chris Angelico
On Wed, Aug 7, 2019 at 7:33 PM Steven D'Aprano  wrote:
> What's the rush? Let's be objective here: what benefit are we going to
> get from this change? Is there anyone hanging out desperately for "\d"
> and "\-" to become SyntaxErrors, so they can... do what?

So that problems can start to be detected. Time and again, Python
users on Windows get EXTREMELY confused by the way their code worked
perfectly with one path, then bizarrely fails with another. That is a
very real problem, and the problem is that it appeared to work when
actually it was wrong.

Python has a history of fixing these problems. It used to be that
b"\x61\x62\x63\x64" was equal to u"abcd", but now Python sees these as
fundamentally different. Data-dependent bugs caused by a syntactic
oddity are a language flaw that needs to be fixed.

> Because our processes don't work the way we assumed, it turns out that
> in practice we haven't given developers the deprecation period we
> thought we had. Read Nathaniel's post, if you haven't already done so:
>
> https://mail.python.org/archives/list/python-dev@python.org/message/E7QCC74OBYEY3PVLNQG2ZAVRO653LD5K/
>
> He makes a compelling case that while we might have had the promised
> deprecation period by the letter of the law, in practice most developers
> will have never seen it, and we will be breaking the spirit of the
> promise if we continue with the unmodified plan.

Yes, that's a fair complaint. But merely pushing the deprecation back
by a version is not solving it. There has to be SOMETHING done
differently.

> And yet here we are rushing through a breaking change in an accelerated
> manner, for a change of marginal benefit.

It's not a marginal benefit. For people who try to teach Python on
multiple operating systems, this is a very very real benefit. Just
because YOU don't see the benefit doesn't mean it isn't there.

> > Otherwise, all you're doing is saying "I wish this
> > problem would just go away".
>
> No, I'm saying we don't have to rush this into 3.8. Let's keep the
> warning silent and push everything back a release.
>
> Now is better than never.
> Although never is often better than *right* now.

Not sure how the Zen supports what you're saying there, since you're
specifically saying "not never, not now, just later". But what do you
actually mean by not rushing this into 3.8?

> Right now, we're looking at a seriously compromised user-experience for
> 3.8. People are going to hate these warnings, many of them won't know
> what to do with them and will be sure that Python is buggy, and for very
> little benefit.

Then the problem is that people blame Python for these warnings. That
is a problem to be solved; we need people to understand that a warning
emitted by a library is a *library bug* not a language flaw.

> > Library authors can start _right now_ fixing their code so it's more
> > 3.8 compatible.
>
> Provided that (1) they are aware that this is a problem that needs to be
> fixed, and (2) they have the round tuits to actually fix it by 3.8.0.
> Neither are guaranteed.

(1) Yes it is, see above; (2) fair point, but this is restricted to
string literals and can be detected simply by compiling the code, so
it's a reasonably findable problem.

> > ("More" because 3.8 doesn't actually break anything.)
> > What is actually gained by waiting longer
>
> We gain the avoidance of a painful experience in 3.8 for a significant
> number of users and third-party devs.
>
> The question we haven't had answered is what we gain by pushing through
> with the original plan. Plenty of people have said "Let's just do it"
> but as far as I can see not one has explained *why* we should put end-
> users and library developers through this frustrating and annoying
> rushed deprecation period.

And unless you have a plan to do something different in 3.8 that
ensures that library devs see the warnings, there's no justification
for the delay. All you'll do is defer the exact same problem by
another eighteen months. If the warning remains silent in 3.8, how
will library devs get any indication that they need to fix something?

If you can offer a better plan, then by all means, do so. But
deferring without a change is of no real value, and it means ANOTHER
eighteen months added onto the time before novice programmers get to
be told about string literal problems.

ChrisA
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/RISO4KSTHBMQZJT5XFS34GCB2PB66WNV/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-07 Thread Paul Moore
On Wed, 7 Aug 2019 at 10:32, Steven D'Aprano  wrote:

> No, I'm saying we don't have to rush this into 3.8. Let's keep the
> warning silent and push everything back a release.
>
> Now is better than never.
> Although never is often better than *right* now.
>
> Right now, we're looking at a seriously compromised user-experience for
> 3.8. People are going to hate these warnings, many of them won't know
> what to do with them and will be sure that Python is buggy, and for very
> little benefit.
>
> Let's slow down and put it off for another release, giving us time to
> solve the warnings problem, and library authors the deprecation period
> promised.

+1 from me. The arguments made here are pretty compelling to me, and I
agree that we should take a breath and not rush this warning into 3.8,
given what we now know.

Paul
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OXGY2MPRTK3BJAXCRVLFKKKQNREKO7O4/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-07 Thread Steven D'Aprano
On Wed, Aug 07, 2019 at 02:33:51PM +1000, Chris Angelico wrote:
> On Wed, Aug 7, 2019 at 1:54 PM Steven D'Aprano  wrote:
> > Don't think of this as a failure. Think of it as an opportunity: we've
> > identified a weakness in our deprecation process. Let's fix that
> > process, make sure that *developers* will see the warning in 3.8 or 3.9,
> > and not raise an exception until 4.0 or 4.1.
> >
> 
> So HOW are you going to make sure developers see it?

I've only just started thinking about it, give me a couple of minutes! *wink*

What's the rush? Let's be objective here: what benefit are we going to 
get from this change? Is there anyone hanging out desperately for "\d" 
and "\-" to become SyntaxErrors, so they can... do what?

Because our processes don't work the way we assumed, it turns out that 
in practice we haven't given developers the deprecation period we 
thought we had. Read Nathaniel's post, if you haven't already done so:

https://mail.python.org/archives/list/python-dev@python.org/message/E7QCC74OBYEY3PVLNQG2ZAVRO653LD5K/

He makes a compelling case that while we might have had the promised 
deprecation period by the letter of the law, in practice most developers 
will have never seen it, and we will be breaking the spirit of the 
promise if we continue with the unmodified plan.

Quite frankly, if we continue with the unmodified plan, third-party devs 
who are affected will have the right to feel mightly pissed off at us. 
We make an implicit, if not explicit, promise that we won't break 
backswards compatibility lightly, but if we do, we will give them plenty 
of notice except under the most dire circumstances (such as a serious 
security vulnerability).

And yet here we are rushing through a breaking change in an accelerated 
manner, for a change of marginal benefit. Sure, we can say that 
*technically* we gave them all the notice promised, it was at the bottom 
of a locked filing cabinet stuck in a disused lavatory with a sign on 
the door saying "Beware of The Leopard".

https://www.goodreads.com/quotes/40705-but-the-plans-were-on-display-on-display-i-eventually

I'm sure that the affected devs will understand why it was *their* fault 
they couldn't see the warnings, when even people from a first-class 
library like SymPy took four iterations to do it right.


> Currently it
> requires some extra steps or flags, which are not well known. What
> change are you proposing for 3.8 that will ensure that this actually
> gets solved? 

Absolutely nothing. I don't have to: we're an entire community, this 
doesn't have to fall only on my shoulders. I'm not even the messenger: 
that's Raymond. I'm just (partly) agreeing with him.

Just because I don't have a solution for this problem doesn't mean the 
problem doesn't exist.


> Otherwise, all you're doing is saying "I wish this
> problem would just go away".

No, I'm saying we don't have to rush this into 3.8. Let's keep the 
warning silent and push everything back a release.

Now is better than never.
Although never is often better than *right* now.

Right now, we're looking at a seriously compromised user-experience for 
3.8. People are going to hate these warnings, many of them won't know 
what to do with them and will be sure that Python is buggy, and for very 
little benefit.

Let's slow down and put it off for another release, giving us time to 
solve the warnings problem, and library authors the deprecation period 
promised.


> Library authors can start _right now_ fixing their code so it's more
> 3.8 compatible.

Provided that (1) they are aware that this is a problem that needs to be 
fixed, and (2) they have the round tuits to actually fix it by 3.8.0. 
Neither are guaranteed.

Its not a big fix, but people have other priorities, like work, family, 
a life, etc. That's why we normally give developers *multiple years* of 
warnings to fix problems, not weeks. This change is not so important 
that we have to push it through in an accelerated time frame.



> ("More" because 3.8 doesn't actually break anything.)
> What is actually gained by waiting longer

We gain the avoidance of a painful experience in 3.8 for a significant 
number of users and third-party devs.

The question we haven't had answered is what we gain by pushing through 
with the original plan. Plenty of people have said "Let's just do it" 
but as far as I can see not one has explained *why* we should put end- 
users and library developers through this frustrating and annoying 
rushed deprecation period.




-- 
Steven
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/I6DFONZPRHL4VYUYICAXIMUTR4KVVHV6/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-07 Thread Serhiy Storchaka

07.08.19 03:57, Gregory P. Smith пише:
People distribute code via pypi.  if we reject uploads of packages with 
these problems and link to fixers (modernize can be taught what to do), 
we prevent them from spreading further.


How can we check that there are such problems in the package? Pass all 
*.py files through a linter? But the package can contain "incorrect" 
files, for example files for Python 2 or earlier Python 3 versions. Even 
the CPython testsuite contains bad Python files for testing purpose.

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/JX7IDIGFLAZIF2YQIR5IYNNHLLHGRA4T/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-07 Thread Serhiy Storchaka

07.08.19 03:31, Rob Cliffe via Python-Dev пише:

How about: whenever a third-party library uses a potentially-wrong
escape sequence, it creates a message on the console. Then when
someone sees that message, they can post a bug report against the
package.


Would not it just increase the amount of a noise? The main complain 
about new warnings is a noise.

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VZHWNZ4X7PXXE4Y4XIZCLMWSYGNJ5WPY/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-07 Thread Serhiy Storchaka

07.08.19 01:37, Brett Cannon пише:

I think this is a good example of how the community is not running tests with 
warnings on and making sure that their code is warnings-free. This warning has 
existed for at least one full release and fixing it doesn't require some crazy 
work-around for backwards compatibility, and so this tells me people are simply 
either ignoring the warnings or they are not aware of them.


There are several PRs for fixing warnings on GitHub every month. And 
seems a deprecation warning about importing ABCs from collections is at 
least so common (if not more) as a warning about "invalid escape 
sequences". The former is more visible to end users because is emitted 
at every run, not only at the first bytecode compilation.

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/7TYOKDS3D5YXKJFBJO6G6OVKVRYKRCHO/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-07 Thread Serhiy Storchaka

06.08.19 20:37, Paul Moore пише:

I don't see issues reported in the bug trackers for docutils and
bottle. Maybe as a start, someone could raise issues there?


The warning in docutils was fixed.
https://sourceforge.net/p/docutils/code/8255/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/JPKBMJ6WH5HQGUDND3JZCLGRQ2KKSEPN/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-06 Thread Chris Angelico
On Wed, Aug 7, 2019 at 1:54 PM Steven D'Aprano  wrote:
> Don't think of this as a failure. Think of it as an opportunity: we've
> identified a weakness in our deprecation process. Let's fix that
> process, make sure that *developers* will see the warning in 3.8 or 3.9,
> and not raise an exception until 4.0 or 4.1.
>

So HOW are you going to make sure developers see it? Currently it
requires some extra steps or flags, which are not well known. What
change are you proposing for 3.8 that will ensure that this actually
gets solved? Otherwise, all you're doing is saying "I wish this
problem would just go away".

Library authors can start _right now_ fixing their code so it's more
3.8 compatible. ("More" because 3.8 doesn't actually break anything.)
What is actually gained by waiting longer, and how do you propose to
make this transition easier?

ChrisA
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UGDPMFKXJRU2CPKPAI5NLHDNH3VG6BWN/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-06 Thread Steven D'Aprano
On Tue, Aug 06, 2019 at 07:58:12PM -0700, Nathaniel Smith wrote:

> For example, all my projects run tests with deprecation warnings
> enabled and warnings turned into errors, but I never saw any of these
> warnings. What happens is: the warning is issued when the .py file is
> byte-compiled; but at this point, deprecation warnings probably aren't
> visible. Later on, when pytest imports the file, it has warnings
> enabled... but now the warning isn't issued.

This!

If Nathaniel's analysis is correct, and I think it is, we've identified 
a flaw in our deprecation process. We've assumed that library devs will 
see the warnings long before end users.

Since the benefit of this breaking change is quite small, let's delay it 
long enough to fix the deprecation process.


-- 
Steven
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/D4BLHVLQL4KR5QGAUMOONVH7MJ4ZMR2L/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-06 Thread Steven D'Aprano
On Wed, Aug 07, 2019 at 10:14:08AM +1000, Chris Angelico wrote:
> On Wed, Aug 7, 2019 at 10:03 AM Steven D'Aprano  wrote:
> > - Keep the SyntaxWarning silent by default for 3.8. That gives us
> > another year or more to gently pressure third-party libraries to fix
> > their code, and to find ways to encourage developers to run with
> > warnings enabled.
> 
> How do you propose to apply this pressure?

We already have some good information about the offending libraries.

(Remember, the libraries here haven't done anything wrong. They were
using a documented feature. We've just changed our mind about that
feature.)

Raymond mentioned two, docutils and bottle, and Matt did a scan of the 
top 100 downloads on PyPI. We can start by reporting this as a bug to 
them.

 
> How about: whenever a third-party library uses a potentially-wrong
> escape sequence, it creates a message on the console. Then when
> someone sees that message, they can post a bug report against the
> package.

You're right, of course, and if we were talking about one or two 
warnings a week, affecting a handful of users, I don't think Raymond 
would have said anything. But apparently this is a widespread problem 
with common third party libraries. That means its going to affect lots 
of people.

We have a few problems:

- The people affected will mostly be the end users, not the developers.

- These sorts of SyntaxWarnings are scary and intimdating to beginners,
  even when they are harmless. Many of them will not know how to silence 
  warnings, or who to report it as a bug to.

- Since end users rarely search for existing bug reports before 
  adding a new one, or upgrade to the latest version, we're 
  effectively sentencing the third-party library authors to be
  flooded with potentially dozens of identical bug reports long after
  they have fixed the issue.

- I expect that many end users will report it as a *Python* bug, so
  we're going to share some of that pain too.

- The benefit of the desired change is relatively low.

The intention was for the developers, not end users, to see the warning. 
If end users see more than a tiny number of these warnings, our plan 
failed. That's okay: since the benefit of the breaking change is small, 
we can rethink the plan, delay the breaking change, and try to come up 
with a better system that ensures developers see these warnings before 
their users do.

We're not fixing a major security issue here, or adding a new feature 
that will make people's code enormously better. We're breaking people's 
code to force them to write "better" code, so that *maybe* some day in 
the future we can add new escape sequences.

That's a really small benefit for breaking backwards compatibility. 
We don't break backwards compatibility lightly because of the knock on 
effects of code churn, libraries that stop working, frustrated users, 
obsoleted blog posts and books, questions asked on Stackoverflow etc. 
When the benefit is small, we require the pain to be correspondingly 
small. That's not going to be the case if we continue with the plan.

Don't think of this as a failure. Think of it as an opportunity: we've 
identified a weakness in our deprecation process. Let's fix that 
process, make sure that *developers* will see the warning in 3.8 or 3.9, 
and not raise an exception until 4.0 or 4.1.

I know people just want to get it over and done with, I do too. But we 
have responsibilities to the community, and we've lived with the current 
behaviour for 25+ years, another 2-3 years won't kill us.


-- 
Steven
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NY5XTN2QYKYAZFDKSX5TSEO3SY4WOXYU/


  1   2   >