Roundup Robot added the comment:
New changeset 2bc04449fd8c by Ezio Melotti in branch '2.7':
#13899: \A, \Z, and \B now correctly match the A, Z, and B literals when used
inside character classes (e.g. [A]). Patch by Matthew Barnett.
http://hg.python.org/cpython/rev/2bc04449fd8c
New changeset
Ezio Melotti added the comment:
Fixed, thanks for the report John, and for the patch Matthew!
--
resolution: - fixed
stage: patch review - committed/rejected
status: open - closed
___
Python tracker rep...@bugs.python.org
Changes by Ezio Melotti ezio.melo...@gmail.com:
--
assignee: - ezio.melotti
stage: needs patch - patch review
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
Matthew Barnett added the comment:
I've attached a patch.
--
keywords: +patch
Added file: http://bugs.python.org/file28614/issue13899.patch
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
Changes by Ezio Melotti ezio.melo...@gmail.com:
--
stage: - needs patch
versions: +Python 3.3, Python 3.4
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
Matthew Barnett pyt...@mrabarnett.plus.com added the comment:
In re, \A within a character set should be similar to \C, but instead it's
still interpreted as meaning the start of the string. That's definitely a bug.
If it doesn't do what it's supposed to do, then it's a bug.
regex tries to be
Terry J. Reedy tjre...@udel.edu added the comment:
Does anyone have regex installed, to see what it does?
--
nosy: +terry.reedy
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
Matthew Barnett pyt...@mrabarnett.plus.com added the comment:
This should answer that question:
re.findall(r[\A\C], r\AC)
['C']
regex.findall(r[\A\C], r\AC)
['A', 'C']
The behaviour of regex is intended to match that of re for backwards
compatibility.
--
Terry J. Reedy tjre...@udel.edu added the comment:
I presume you intend regex to match the spec rather than bugs. So if re has a
bug in an obsure corner case and the spec is ambiguous, as I have the
impression is the case here, using the interpretation embodied in regex would
avoid creating a
Ezio Melotti ezio.melo...@gmail.com added the comment:
The rule 1 makes sense, but it's not entirely obvious (people might consider
bBaAzZ special too).
The normal Python rules for backslash escapes but revert to the C behaviour of
stripping the \ from unrecognised escapes is not obvious
Changes by Jesús Cea Avión j...@jcea.es:
--
nosy: +jcea
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
___
Python-bugs-list mailing list
Georg Brandl ge...@python.org added the comment:
r'[\w]' also matches word chars. I find that a very useful property, since you
can easily build classes like '[\w.]' It's also impossible to change this
without breaking lots of regexes. It's also explicitly documented, although
IMO it's not
Ezio Melotti ezio.melo...@gmail.com added the comment:
[\w] should definitely work, but [\B] doesn't seem to match anything useful,
and it just fails silently because it's neither equivalent to \B nor to [B]:
re.match(r'foo\B', 'foobar') # on a non-word-boundary -- matches fine
_sre.SRE_Match
Georg Brandl ge...@python.org added the comment:
Interesting. That shifts the issue, since the current behavior is neither of
the two that make sense. Then it would indeed make the most sense to raise in
these cases.
(I wonder what these patterns actually would match, but I have no time to
John Machin sjmac...@lexicon.net added the comment:
@Ezio: Comparison of the behaviour of \letter inside/outside character classes
is irrelevant. The rules for inside can be expressed simply as:
1. Letters dDsSwW are special; they represent categories as documented, and do
in fact have a
John Machin sjmac...@lexicon.net added the comment:
Whoops: normal Python rules for backslash escapes should have had a note but
revert to the C behaviour of stripping the \ from unrecognised escapes which
is what re appears to do in its own \ handling.
--
New submission from John Machin sjmac...@lexicon.net:
Expected behaviour illustrated using C:
import re
re.findall(r'[\C]', 'CCC')
['C', 'C', 'C']
re.compile(r'[\C]', 128)
literal 67
_sre.SRE_Pattern object at 0x01FC6E78
re.compile(r'C', 128)
literal 67
_sre.SRE_Pattern object at 0x01FC6F08
Ezio Melotti ezio.melo...@gmail.com added the comment:
This happens because \A, \B and \Z are valid escape sequences[0].
If what you mean is that they shouldn't be recognized as such inside a
character class, then I can agree with that.
^ and $ are similar to \A and \Z but they are considered
John Machin sjmac...@lexicon.net added the comment:
@ezio: Of course the context is inside a character class.
I expect r'[\b]' to act like r'\b' aka r'\x08' aka backspace because (1) that
is the treatment applied to all other C-like control char escapes (2) the docs
say so explicitly: Inside
19 matches
Mail list logo