[issue25054] Capturing start of line '^'

2018-03-14 Thread Serhiy Storchaka
Change by Serhiy Storchaka : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker

[issue25054] Capturing start of line '^'

2017-12-04 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: New changeset 70d56fb52582d9d3f7c00860d6e90570c6259371 by Serhiy Storchaka in branch 'master': bpo-25054, bpo-1647489: Added support of splitting on zerowidth patterns. (#4471)

[issue25054] Capturing start of line '^'

2017-12-02 Thread Matthew Barnett
Matthew Barnett added the comment: findall() and finditer() consist of multiple uses of search(), basically, as do sub() and split(), so we want the same rule to apply to them all. -- ___ Python tracker

[issue25054] Capturing start of line '^'

2017-12-02 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Avoiding ZWM after a NWM in re.sub() is explicitly documented (and the documentation is correct in this case). This follows the behavior in the ancient RE implementation. Once it was broken in sre, but then fixed (see

[issue25054] Capturing start of line '^'

2017-12-02 Thread Matthew Barnett
Matthew Barnett added the comment: The pattern: \b|:+ will match a word boundary (zero-width) before colons, so if there's a word followed by colons, finditer will find the boundary and then the colons. You _can_ get a zero-width match (ZWM) joined to the

[issue25054] Capturing start of line '^'

2017-12-02 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: The clause "Empty matches are included in the result unless they touch the beginning of another match" was added in 2f3e5483a3324b44fa5dbbb98859dc0ac42b6070 (issue732120) and I suppose it never was correct. So we can ignore it

[issue25054] Capturing start of line '^'

2017-12-02 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Good point. Neither old nor new (which matches regex) behaviors conform the documentation: "Empty matches are included in the result unless they touch the beginning of another match." It is easy to exclude empty matches that

[issue25054] Capturing start of line '^'

2017-12-02 Thread Serhiy Storchaka
Change by Serhiy Storchaka : -- pull_requests: +4586 ___ Python tracker ___ ___

[issue25054] Capturing start of line '^'

2017-12-02 Thread Martin Panter
Martin Panter added the comment: The new “finditer” behaviour seems to contradict the documentation about excluding empty matches if they touch the start of another match. >>> list(re.finditer(r"\b|:+", "a::bc")) [, , , , ] An empty match at (1, 1) is included, despite

[issue25054] Capturing start of line '^'

2017-12-01 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Could anybody please make review at least of the documentation part? I want to merge this before 3.7.0a3 be released. Initially I was going to backport the part that relates findall(), finditer() and sub(). It changes the

[issue25054] Capturing start of line '^'

2017-11-19 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: PR 4471 fixes this issue, issue1647489, and a couple of similar issues. The most visible change is the change in re.split(). This is compatibility breaking change, and it affects third-party code. But ValueError or

[issue25054] Capturing start of line '^'

2017-11-19 Thread Serhiy Storchaka
Change by Serhiy Storchaka : -- keywords: +patch pull_requests: +4403 stage: -> patch review ___ Python tracker ___

[issue25054] Capturing start of line '^'

2017-11-16 Thread Serhiy Storchaka
Change by Serhiy Storchaka : -- assignee: -> serhiy.storchaka nosy: +serhiy.storchaka versions: +Python 2.7, Python 3.7 -Python 3.5 ___ Python tracker

[issue25054] Capturing start of line '^'

2016-01-01 Thread Ezio Melotti
Ezio Melotti added the comment: AFAIU the problem is at Modules/_sre.c:852: after matching, if the ptr is still at the start position, the start position gets incremented to avoid an endless loop. Ideally the problem could be avoided by marking and skipping the part(s) of the pattern that

[issue25054] Capturing start of line '^'

2015-09-10 Thread Matthew Barnett
Matthew Barnett added the comment: After matching '^', it advances so that it won't find the same match again (and again and again...). Unfortunately, that means that it sometimes misses some matches. It's a known issue. -- ___ Python tracker

[issue25054] Capturing start of line '^'

2015-09-10 Thread R. David Murray
R. David Murray added the comment: ^ finds an empty match at the beginning of the string, $ finds an empty match at the end. I don't see the bug (but I'm not a regex expert). -- nosy: +r.david.murray ___ Python tracker

[issue25054] Capturing start of line '^'

2015-09-10 Thread Alcolo Alcolo
New submission from Alcolo Alcolo: Why re.findall('^|a', 'a') != ['', 'a'] ? We have: re.findall('^|a', ' a') == ['', 'a'] and re.findall('$|a', ' a') == ['a', ''] Capturing '^' take the 1st character. It's look like a bug ... -- components: Regular Expressions messages: 250364 nosy:

[issue25054] Capturing start of line '^'

2015-09-10 Thread Alcolo Alcolo
Alcolo Alcolo added the comment: Naively, I thinked that ^ is be considered as a 0-length token (like $, \b, \B), then after capturing it, we can read the next token : 'a' (for the input string "a"). I use a simple work around: prepending my string with ' ' (because ' ' is neutral with my

[issue25054] Capturing start of line '^'

2015-09-10 Thread Matthew Barnett
Matthew Barnett added the comment: Just to confirm, it _is_ a bug. It tries to avoid getting stuck, but the way it does that causes it to skip a character, sometimes missing a match it should have found. -- ___ Python tracker