Re: [Python-Dev] Zero-width matching in regexes

Serhiy Storchaka Wed, 06 Dec 2017 06:18:09 -0800

06.12.17 15:37, Paul Moore пише:

Behaviour (1) means that we'd get


>>> regex.sub(r'\w*', 'x', 'hello world', flags=regex.VERSION1)
'xx xx'

(because \w* matches the empty string after each word, as well as each
word itself). I just tested in Perl, and that is indeed what happens
there as well.


Yes, because in this case you need to use `\w+`, not `\w*`.

No CPython tests will be failed if change re.sub() to behaviour (2)except just added in 3.7 tests and the one test specially purposed toguard the old behavior. But I don't know how much third party code willbe broken if made this change.

On that basis, I have to say that I find behaviour (2) more intuitive
and (arguably) "correct":

>>> regex.sub(r'\w*', 'x', 'hello world', flags=regex.VERSION0)
'x x'
>>> re.sub(r'\w*', 'x', 'hello world')
'x x'

The actual behavior of re.sub() and regex.sub() in the VERSION0 mode wasa weird behavior (4).


>>> regex.sub(r'(\b|\w+)', r'[\1]', 'hello world', flags=regex.VERSION0)
'[]h[ello] []w[orld]'
>>> regex.sub(r'(\b|\w+)', r'[\1]', 'hello world', flags=regex.VERSION1)
'[][hello][] [][world][]'
>>> re.sub(r'(\b|\w+)', r'[\1]', 'hello world')  # 3.6, behavior (4)
'[]h[ello] []w[orld]'
>>> re.sub(r'(\b|\w+)', r'[\1]', 'hello world')  # 3.7, behavior (2)
'[][hello] [][world]'

_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Zero-width matching in regexes

Reply via email to