[Python-Dev] Adding new escapes to regex module

2022-08-16 Thread MRAB
Other regex implementations have escape sequences for horizontal 
whitespace (`\h` and `\H`) and vertical whitespace (`\v` and `\V`).


The regex module already supports `\h`, but I can't use `\v` because it 
represents `\0x0b', as it does in the re module.


Now that someone has asked for it, I'm trying to find a nice way of 
adding it, and I'm currently thinking that maybe I could use `\y` and 
`\Y` instead as they look a little like `\v` and `\V`, and, also, 
vertical whitespace is sort-of in the y-direction.


As far as I can tell, only ProgressSQL uses them, and, even then, it's 
for what everyone else writes as `\b` and `\B`.


I want the regex module to remain compatible with the re module, in case 
they get added there sometime in the future.


Opinions?
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/AYOYEAFOJW4ZHVYBDVMH4MWKXNLBBJ62/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Adding new escapes to regex module

2022-08-16 Thread Barry Scott



> On 16 Aug 2022, at 21:24, MRAB  wrote:
> 
> Other regex implementations have escape sequences for horizontal whitespace 
> (`\h` and `\H`) and vertical whitespace (`\v` and `\V`).
> 
> The regex module already supports `\h`, but I can't use `\v` because it 
> represents `\0x0b', as it does in the re module.

You seem to be mixing the use \ as the escape for strings and the \ that re 
uses.
Is it the behaviour that '\' becomes '\\' that means this is 
a breaking change?

Won't this work?
```
re.compile('\v:\\v')
# which is the same as
re.compile(r'\x0b:\v')
```

Barry

> Now that someone has asked for it, I'm trying to find a nice way of adding 
> it, and I'm currently thinking that maybe I could use `\y` and `\Y` instead 
> as they look a little like `\v` and `\V`, and, also, vertical whitespace is 
> sort-of in the y-direction.
> 
> As far as I can tell, only ProgressSQL uses them, and, even then, it's for 
> what everyone else writes as `\b` and `\B`.
> 
> I want the regex module to remain compatible with the re module, in case they 
> get added there sometime in the future.
> 
> Opinions?
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/AYOYEAFOJW4ZHVYBDVMH4MWKXNLBBJ62/
> Code of Conduct: http://python.org/psf/codeofconduct/
> 

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/R7MG2MKGXTIEXOAQDJ72LE2QLGDT7KNA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Adding new escapes to regex module

2022-08-16 Thread MRAB

On 2022-08-16 22:14, Barry Scott wrote:

> On 16 Aug 2022, at 21:24, MRAB  wrote:
> 
> Other regex implementations have escape sequences for horizontal whitespace (`\h` and `\H`) and vertical whitespace (`\v` and `\V`).
> 
> The regex module already supports `\h`, but I can't use `\v` because it represents `\0x0b', as it does in the re module.


You seem to be mixing the use \ as the escape for strings and the \ that re 
uses.
Is it the behaviour that '\' becomes '\\' that means this is 
a breaking change?

Won't this work?
```
re.compile('\v:\\v')
# which is the same as
re.compile(r'\x0b:\v')
```

Some languages, e.g. Perl, have a dedicated syntax for writing regexes, 
and they take `\n` (a backslash followed by 'n') to mean "match a newline".


Other languages, including Python, use string literals and can contain 
an actual newline, but they also take `\n` (a backslash followed by 'n') 
to mean "match a newline".


Thus:

>>> print(re.match('\n', '\n')) # Literal newline.

>>> print(re.match('\\n', '\n')) # `\n` sequence.


On the other hand:

>>> print(re.match('\b', '\b')) # Literal backspace.

>>> print(re.match('\\b', '\b')) # `\b` sequence, which means a word 
boundary.

None
>>>

The problem is that the re and regex modules already have the `\v` (a 
backslash followed by 'v') sequence to mean "match the '\v' character", so:


re.compile('\v')

and:

re.compile('\\v')

mean exactly the same.


> Now that someone has asked for it, I'm trying to find a nice way of adding 
it, and I'm currently thinking that maybe I could use `\y` and `\Y` instead as 
they look a little like `\v` and `\V`, and, also, vertical whitespace is sort-of 
in the y-direction.
> 
> As far as I can tell, only ProgressSQL uses them, and, even then, it's for what everyone else writes as `\b` and `\B`.
> 
> I want the regex module to remain compatible with the re module, in case they get added there sometime in the future.
> 
> Opinions?

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/KHI74Y2JJRYFRBGGNJUSL7RZCBAI7IAN/
Code of Conduct: http://python.org/psf/codeofconduct/