[issue32308] Replace empty matches adjacent to a previous non-empty match in re.sub()

2020-04-16 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

If the behavior is obviously wrong (like in issue25054), we can fix it without 
warnings, and even backport the fix to older versions, because we do not expect 
that anybody depends on such weird behavior. If we are going to change the 
behavior, but expect that users can depend on the current behavior, we emit a 
FutureWarning first (and we did it for other changes in re). But this issue is 
the hard one. Before 3.7 we did not know that it is related to issue25054. We 
were not going to change this behavior (at least not in near future). But when 
a fix for issue25054 was written we did see that it is the same issue. We did 
not want to keep a bug in issue25054 few versions more, so we changed the 
behavior in this issue without warnings. It was an exceptional case.

This change was documented, in the module documentation, and in "What's New in 
Python 3.7" (section "Porting to Python 3.7"). If this is not enough we will be 
happy to get help to make it better.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32308] Replace empty matches adjacent to a previous non-empty match in re.sub()

2020-04-16 Thread Mark Borgerding


Mark Borgerding  added the comment:

@serhiy.storchaka  Thanks for the link to issue25054 to clarify this change was 
not done solely for aesthetics.
Hopefully that will mollify others like me who find their way to this 
discussion as they try to figure out why their code broke with a new version of 
python.


I wish it had been done in a more staged and overt way, but that is just 
spitting in the wind at this point.


Thanks for all your work, my gripe du jour notwithstanding.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32308] Replace empty matches adjacent to a previous non-empty match in re.sub()

2020-04-16 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

The former implementation was wrong. See issue25054 which contains more obvious 
examples of that bug:

>>> re.sub(r"\b|:+", "-", "a::bc")
'-a-:-bc-'

Not all colons were replaced despite the fact that the pattern matches all 
colons.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32308] Replace empty matches adjacent to a previous non-empty match in re.sub()

2020-04-16 Thread Mark Borgerding


Mark Borgerding  added the comment:

So third-party code was knowingly broken to satisfy an aesthetic notion that 
substitution should be more like iteration.

Would not a FutureWarning have been a kinder way to stage this implementation?

A foolish consistency, indeed.

--
nosy: +Mark Borgerding

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32308] Replace empty matches adjacent to a previous non-empty match in re.sub()

2020-01-20 Thread Anders Hovmöller

Anders Hovmöller  added the comment:

We were also bitten by this. In fact we still run a compatibility shim in 
production where we log if the new and old behavior are different. We also 
didn't think this "bug fix" made sense or was treated with the appropriate 
gravity in the release notes. 

I understand the logic in the bug tracker and they it matches other languages 
is good. But the bahvior also makes no sense for the .* case unfortunately. 

> On 21 Jan 2020, at 05:56, David Barnett  wrote:
> 
> 
> David Barnett  added the comment:
> 
> We were also bitten by this behavior change in 
> https://github.com/google/vroom/issues/110. I'm kinda baffled by the new 
> behavior and assumed it had to be an accidental regression, but I guess not. 
> If you have any other context on the BDFL conversation and reasoning for 
> calling this behavior correct, I'd love to see additional info.
> 
> --
> nosy: +mu_mind
> 
> ___
> Python tracker 
> 
> ___

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32308] Replace empty matches adjacent to a previous non-empty match in re.sub()

2020-01-20 Thread David Barnett


David Barnett  added the comment:

We were also bitten by this behavior change in 
https://github.com/google/vroom/issues/110. I'm kinda baffled by the new 
behavior and assumed it had to be an accidental regression, but I guess not. If 
you have any other context on the BDFL conversation and reasoning for calling 
this behavior correct, I'd love to see additional info.

--
nosy: +mu_mind

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32308] Replace empty matches adjacent to a previous non-empty match in re.sub()

2019-04-12 Thread Matthew Barnett


Matthew Barnett  added the comment:

Consider re.findall(r'.{0,2}', 'abcde').

It finds 'ab', then continues where it left off to find 'cd', then 'e'.

It can also find ''; re.match(r'.*', '') does match, after all.

It could, in fact, an infinite number of ''.

And what about re.match(r'()*', '')?

What should it do? Run forever? Raise an exception?

At some point you have to make a decision as to what should happen, and the 
general consensus has been to match once.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32308] Replace empty matches adjacent to a previous non-empty match in re.sub()

2019-04-12 Thread Anders Hovmöller

Anders Hovmöller  added the comment:

That might be true, but that seems like a weak argument. If anything, it means 
those others are broken. What is the logic behind "(.*)" returning the entire 
string (which is what you asked for) and exactly one empty string? Why not two 
empty strings? 3? 4? 5? Why not an empty string at the beginning? It makes no 
practical sense.

We will have to spend considerable effort to work around this change and adapt 
our code to 3.7. The lack of a discussion about backwards compatibility in 
this, and the other, thread before making this change is also a problem I think.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32308] Replace empty matches adjacent to a previous non-empty match in re.sub()

2019-04-11 Thread Matthew Barnett


Matthew Barnett  added the comment:

It's now consistent with Perl, PCRE and .Net (C#), as well as re.split(), 
re.sub(), re.findall() and re.finditer().

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32308] Replace empty matches adjacent to a previous non-empty match in re.sub()

2019-04-11 Thread Anders Hovmöller

Anders Hovmöller  added the comment:

Just as a comparison, sed does the 3.6 thing:

> echo foo | sed 's/\(.*\)/x\1y/g'
xfooy

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32308] Replace empty matches adjacent to a previous non-empty match in re.sub()

2019-04-11 Thread Anders Hovmöller

Anders Hovmöller  added the comment:

This was a really bad idea in my opinion. We just found this and we have no way 
to know how this will impact production. It's really absurd that 

re.sub('(.*)', r'foo', 'asd')

is "foo" in python 1 to 3.6 but 'foofoo' in python 3.7.

--
nosy: +Anders.Hovmöller

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32308] Replace empty matches adjacent to a previous non-empty match in re.sub()

2018-01-04 Thread Serhiy Storchaka

Change by Serhiy Storchaka :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32308] Replace empty matches adjacent to a previous non-empty match in re.sub()

2018-01-04 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:


New changeset fbb490fd2f38bd817d99c20c05121ad0168a38ee by Serhiy Storchaka in 
branch 'master':
bpo-32308: Replace empty matches adjacent to a previous non-empty match in 
re.sub(). (#4846)
https://github.com/python/cpython/commit/fbb490fd2f38bd817d99c20c05121ad0168a38ee


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32308] Replace empty matches adjacent to a previous non-empty match in re.sub()

2017-12-26 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

Could anybody please make a review of at least the documentation part?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32308] Replace empty matches adjacent to a previous non-empty match in re.sub()

2017-12-13 Thread Serhiy Storchaka

Change by Serhiy Storchaka :


--
keywords: +patch
pull_requests: +4734
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32308] Replace empty matches adjacent to a previous non-empty match in re.sub()

2017-12-13 Thread Serhiy Storchaka

New submission from Serhiy Storchaka :

Currently re.sub() replaces empty matches only when not adjacent to a previous 
match. This makes it inconsistent with re.findall() and re.finditer() which 
finds empty matches adjacent to a previous non-empty match and with other RE 
engines.

Proposed PR makes all functions that makes repeated searching (re.split(), 
re.sub(), re.findall(), re.finditer()) mutually consistent.

The PR change the behavior of re.split() too, but this doesn't matter, since it 
already is different from the 3.6 behavior.

BDFL have approved this change.

This change doesn't break any stdlib code. It is expected that it will not 
break much third-party code, and even if it will break some code, it can be 
easily rewritten. For example replacing re.sub('(.*)', ...) (which now matches 
an empty string at the end of the string) with re.sub('(.+)', ...) is an 
obvious fix.

--
assignee: serhiy.storchaka
components: Library (Lib), Regular Expressions
messages: 308229
nosy: ezio.melotti, mrabarnett, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Replace empty matches adjacent to a previous non-empty match in re.sub()
type: enhancement
versions: Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com