05.12.17 01:21, MRAB пише:
I've finally come to a conclusion as to what the "correct" behaviour of zero-width matches should be: """always return the first match, but never a zero-width match that is joined to a previous zero-width match""".

If it's about to return a zero-width match that's joined to a previous zero-width match, then backtrack and keep on looking for a match.

Example:

 >>> print([m.span() for m in re.finditer(r'|.', 'a')])
[(0, 0), (0, 1), (1, 1)]

re.findall, re.split and re.sub should work accordingly.

If re.finditer finds n matches, then re.split should return a list of n+1 strings and re.sub should make n replacements (excepting maxsplit, etc.).

We now have a good opportunity of changing a long standing behavior of re.sub(). Currently empty matches are prohibited if adjacent to a previous match. For consistency with re.finditer() and re.findall(), with regex.sub() with VERSION1 flag, and with Perl, PCRE and other engines they should be prohibited only if adjacent to a previous *empty* match. Currently re.sub('x*', '-', 'abxc') returns '-a-b-c-', but will return '-a-b--c-' if change the behavior.

This behavior already was unintentionally temporary changed between 2.1 and 2.2, when the underlying implementation of re was changed from PCRE to SRE. But the former behavior was quickly restored (see https://bugs.python.org/issue462270). Ironically the behavior of the current PCRE is different.

Possible options:

1. Change the behavior right now.
2. Start emitting a FutureWarning and change the behavior in future version.
3. Keep the status quo forever.

We need to make a decision right now since in the first two cases we should to change the behavior of re.split() right now. Its behavior is changed in 3.7 in any case, and it is better to change the behavior once than break the behavior in two different releases.

The changed detail is so subtle that no regular expressions in the stdlib and tests are affected, except the special purposed test added for guarding the current behavior.

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to