Re: Did Python 3.12 developers honestly broke special regexp sequences? (Was: hatop fails its autopkg tests with Python 3.12)

2024-02-14 Thread Stéphane Blondon

Hello Andreas,

On 13/02/2024 18:21, Andreas Tille wrote:

I was constantly shaking my had above bug #1061802 featuring
Syntaxwarnings like

SyntaxWarning: invalid escape sequence '\.'
573s   CLI_INPUT_RE = re.compile('[a-zA-Z0-9_:\.\-\+; /#%]')
573s /tmp/autopkgtest.G4v4eK/autopkgtest_tmp/hatop.py:215:
SyntaxWarning: invalid escape sequence '\s'
573s   'software_name':re.compile('^Name:\s*(?P\S+)'),




Python 3.12 added some SyntaxWarning when there is invalid sequences in 
a string. (If I remember correctly, it has already happened in the 
past.) It's a warning so the code should work properly.


If a string contains 'invalid sequences' (like regexp), you can declare 
it as raw string: they are prefixed with an 'r' character.


The source code should be:
'software_name':re.compile(r'^Name:\s*(?P\S+)'),


The fix comes from the second point in:
https://docs.python.org/3/whatsnew/3.12.html#other-language-changes



--
Stéphane


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: Did Python 3.12 developers honestly broke special regexp sequences? (Was: hatop fails its autopkg tests with Python 3.12)

2024-02-13 Thread Simon McVittie
On Tue, 13 Feb 2024 at 18:21:17 +0100, Andreas Tille wrote:
> SyntaxWarning: invalid escape sequence '\.'
> 573s   CLI_INPUT_RE = re.compile('[a-zA-Z0-9_:\.\-\+; /#%]')

This should be:

re.compile(r'[a-zA-Z0-9_:\.\-\+; /#%]')
   ^

a raw string, where the backslashes are not interpreted by the
Python parser, allowing them to be passed through to the re module for
parsing; or alternatively

re.compile('[a-zA-Z0-9_:\\.\\-\\+; /#%]')
^^ ^^ ^^

like you would have to write in the C equivalent.

Reference:

"""
Regular expressions use the backslash character ('\') to indicate
special forms or to allow special characters to be used without
invoking their special meaning. This collides with Python’s usage
of the same character for the same purpose in string literals;
for example, to match a literal backslash, one might have to write
'' as the pattern string, because the regular expression must
be \\, and each backslash must be expressed as \\ inside a regular
Python string literal. Also, please note that any invalid escape
sequences in Python’s usage of the backslash in string literals
now generate a SyntaxWarning and in the future this will become a
SyntaxError. This behaviour will happen even if it is a valid escape
sequence for a regular expression.

The solution is to use Python’s raw string notation for regular
expression patterns; backslashes are not handled in any special way
in a string literal prefixed with 'r'. So r"\n" is a two-character
string containing '\' and 'n', while "\n" is a one-character string
containing a newline. Usually patterns will be expressed in Python
code using this raw string notation.
"""
—re module docs

> which makes me scratching my head what else we should write
> for "any kind of space" now in Python3.12.

\s continues to be correct for "any kind of space", but Python now
complains if you do the backslash-escapes in the wrong layer of syntax.

smcv



Did Python 3.12 developers honestly broke special regexp sequences? (Was: hatop fails its autopkg tests with Python 3.12)

2024-02-13 Thread Andreas Tille
Hi,

I was constantly shaking my had above bug #1061802 featuring
Syntaxwarnings like

SyntaxWarning: invalid escape sequence '\.'
573s   CLI_INPUT_RE = re.compile('[a-zA-Z0-9_:\.\-\+; /#%]')
573s /tmp/autopkgtest.G4v4eK/autopkgtest_tmp/hatop.py:215: 
SyntaxWarning: invalid escape sequence '\s'
573s   'software_name':re.compile('^Name:\s*(?P\S+)'),

which is even in contrast with Regular expression operations
documentation for Python3.12[1] where

\s
For Unicode (str) patterns:
Matches Unicode whitespace characters (which includes [ \t\n\r\f\v], and also 
many other characters, for example the non-breaking spaces mandated by 
typography rules in many languages).

Matches [ \t\n\r\f\v] if the ASCII flag is used.

For 8-bit (bytes) patterns:
Matches characters considered whitespace in the ASCII character set; this is 
equivalent to [ \t\n\r\f\v].


remains what I know \s is meaning in regular expressions.  (I admit '\.'
is unusual inside [] and I think just '.' is appropriate here.)  I tried
simple things like


$ python3.12
Python 3.12.2 (main, Feb  7 2024, 20:47:03) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> software_name = re.compile('^Name:\s*(?P\S+)')
:1: SyntaxWarning: invalid escape sequence '\s'
>>> software_name = re.compile('^Name:[\s\t\n]*(?P\S+)')
:1: SyntaxWarning: invalid escape sequence '\s'


which makes me scratching my head what else we should write
for "any kind of space" now in Python3.12.

Kind regards
   Andreas.

[1] https://docs.python.org/3/library/re.html

-- 
http://fam-tille.de