[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2013-01-10 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 2bc04449fd8c by Ezio Melotti in branch '2.7':
#13899: \A, \Z, and \B now correctly match the A, Z, and B literals when used 
inside character classes (e.g. [A]).  Patch by Matthew Barnett.
http://hg.python.org/cpython/rev/2bc04449fd8c

New changeset 081db241ccda by Ezio Melotti in branch '3.2':
#13899: \A, \Z, and \B now correctly match the A, Z, and B literals when used 
inside character classes (e.g. [A]).  Patch by Matthew Barnett.
http://hg.python.org/cpython/rev/081db241ccda

New changeset 17b1eb4a8144 by Ezio Melotti in branch '3.3':
#13899: merge with 3.2.
http://hg.python.org/cpython/rev/17b1eb4a8144

New changeset 35ece2465936 by Ezio Melotti in branch 'default':
#13899: merge with 3.3.
http://hg.python.org/cpython/rev/35ece2465936

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2013-01-10 Thread Ezio Melotti

Ezio Melotti added the comment:

Fixed, thanks for the report John, and for the patch Matthew!

--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2013-01-08 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
assignee:  - ezio.melotti
stage: needs patch - patch review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2013-01-07 Thread Matthew Barnett

Matthew Barnett added the comment:

I've attached a patch.

--
keywords: +patch
Added file: http://bugs.python.org/file28614/issue13899.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2013-01-06 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
stage:  - needs patch
versions: +Python 3.3, Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2012-02-04 Thread Matthew Barnett

Matthew Barnett pyt...@mrabarnett.plus.com added the comment:

In re, \A within a character set should be similar to \C, but instead it's 
still interpreted as meaning the start of the string. That's definitely a bug.

If it doesn't do what it's supposed to do, then it's a bug.

regex tries to be backwards compatible with re but fix such bugs.

The only buggy behaviour which it retains in its version 0 (compatible) 
behaviour is not splitting on a zero-width match, and that's only because GvR 
believes that some existing code which uses re may rely on that behaviour. In 
its version 1 (extended) behaviour it does split on a zero-width match.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2012-02-03 Thread Terry J. Reedy

Terry J. Reedy tjre...@udel.edu added the comment:

Does anyone have regex installed, to see what it does?

--
nosy: +terry.reedy

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2012-02-03 Thread Matthew Barnett

Matthew Barnett pyt...@mrabarnett.plus.com added the comment:

This should answer that question:

 re.findall(r[\A\C], r\AC)
['C']
 regex.findall(r[\A\C], r\AC)
['A', 'C']

The behaviour of regex is intended to match that of re for backwards 
compatibility.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2012-02-03 Thread Terry J. Reedy

Terry J. Reedy tjre...@udel.edu added the comment:

I presume you intend regex to match the spec rather than bugs. So if re has a 
bug in an obsure corner case and the spec is ambiguous, as I have the 
impression is the case here, using the interpretation embodied in regex would 
avoid creating a conflict.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2012-01-31 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

The rule 1 makes sense, but it's not entirely obvious (people might consider 
bBaAzZ special too).

The normal Python rules for backslash escapes but revert to the C behaviour of 
stripping the \ from unrecognised escapes is not obvious either, and from 
r'[\A]' people might expect:
  1) same as \A, (beginning of the string);
  2) a letter 'A';
  3) a '\' or a letter 'A' (especially if they write it as '[\\A]');

This is why I suggested to raise an error (and refuse the temptation to guess), 
but on the other hand, if you consider 'A' a normal letter like 'C', having 
an error for \A would be incoherent.
It would have been better if \C raised an error too (I don't see why that would 
appear in a regex, since re.escape doesn't escape C and the user has no reason 
to add the \), but now it's too late for that.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2012-01-31 Thread Jesús Cea Avión

Changes by Jesús Cea Avión j...@jcea.es:


--
nosy: +jcea

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2012-01-29 Thread Georg Brandl

Georg Brandl ge...@python.org added the comment:

r'[\w]' also matches word chars.  I find that a very useful property, since you 
can easily build classes like '[\w.]'  It's also impossible to change this 
without breaking lots of regexes.  It's also explicitly documented, although 
IMO it's not clear it extends to \A and \Z, since it talks about character 
classes.  So this is a docs issue.

--
assignee:  - docs@python
components: +Documentation
nosy: +docs@python, georg.brandl

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2012-01-29 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

[\w] should definitely work, but [\B] doesn't seem to match anything useful, 
and it just fails silently because it's neither equivalent to \B nor to [B]:
 re.match(r'foo\B', 'foobar')  # on a non-word-boundary -- matches fine
_sre.SRE_Match object at 0xb76dd3a0
 re.match(r'foo[B]', 'fooBar')  # same as r'fooB'
_sre.SRE_Match object at 0xb76dd1e0
 re.match(r'foo[\B]', 'foobar')  # not equivalent to \B
 re.match(r'foo[\B]', 'fooBar')  # not equivalent to [B]

The same is true for \Z and \A:
 re.match(r'foo\Z', 'foo')  # end of the string -- matches fine
_sre.SRE_Match object at 0xb76dd3a0
 re.match(r'foo[Z]', 'fooZ')  # same as r'fooZ'
_sre.SRE_Match object at 0xb76dd1e0
 re.match(r'foo[\Z]', 'foo')  # not equivalent to \Z
 re.match(r'foo[\Z]', 'fooZ')  # not equivalent to [Z]

 re.match(r'\Afoo', 'foo')  # beginning of the string -- matches fine
_sre.SRE_Match object at 0xb76dd1e0
 re.match(r'[A]foo', 'Afoo')  # same as r'Afoo'
_sre.SRE_Match object at 0xb76dd3a0
 re.match(r'[\A]foo', 'foo')  # not equivalent to \A
 re.match(r'[\A]foo', 'Afoo')  # not equivalent to [A]

Inside [], \b switches from word boundary to backspace:
 re.match(r'foo\b', 'foobar')  # not on a word boundary -- no matches
 re.match(r'foo\b', 'foo bar')  # on a word boundary  -- matches fine
_sre.SRE_Match object at 0xb74a4ec8
 re.match(r'foo[\b]', 'foo bar')  # not equivalent to \b
 re.match(r'foo[\b]', 'foo\bbar')  # matches backspace
_sre.SRE_Match object at 0xb76dd3d8
 re.match(r'foo([\b])', 'foo\bbar').group(1)
'\x08'

Given that \b doesn't keep its word boundary meaning inside the [], \B (and \A 
and \Z) shouldn't keep it either (also because I can't see how having these 
inside [] would be of any use).
On the other hand I'm not sure they should be equivalent to B, A, Z either.  
There are several escape sequences in the form \X (where X is an upper- or 
lower-case letter) that are not equivalent to X (\a\b\d\f\s\x\w\D\S\W...).
Raising an error that says something like I don't think [\A] does what you 
think it does, use [A] instead. might be a better option (and in case anyone 
is wondering about re.escape, I just checked and it doesn't escape letters).  
Even if this is technically backward incompatible, any string that has \A, \B, 
\Z inside [] can be considered buggy IMHO (unless someone can come up with a 
valid use case where they do something useful).

--
assignee: docs@python - 

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2012-01-29 Thread Georg Brandl

Georg Brandl ge...@python.org added the comment:

Interesting. That shifts the issue, since the current behavior is neither of 
the two that make sense. Then it would indeed make the most sense to raise in 
these cases.

(I wonder what these patterns actually would match, but I have no time to look 
in the sre sources right now...)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2012-01-29 Thread John Machin

John Machin sjmac...@lexicon.net added the comment:

@Ezio: Comparison of the behaviour of \letter inside/outside character classes 
is irrelevant. The rules for inside can be expressed simply as:

1. Letters dDsSwW are special; they represent categories as documented, and do 
in fact have a similar meaning outside character classes.

2. Otherwise normal Python rules for backslash escapes in string literals 
should be followed. This means automatically that \a - \x07, \A - A, \b - 
backspace, \B - B, \z - z and \Z - Z.

@Georg: No need to read the source, just read my initial posting: It's compiled 
as a zero-length matcher (at) inside a character class (in) i.e. a 
nonsense, then at runtime the illegality is deliberately ignored.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2012-01-29 Thread John Machin

John Machin sjmac...@lexicon.net added the comment:

Whoops: normal Python rules for backslash escapes should have had a note but 
revert to the C behaviour of stripping the \ from unrecognised escapes which 
is what re appears to do in its own \ handling.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2012-01-28 Thread John Machin

New submission from John Machin sjmac...@lexicon.net:

Expected behaviour illustrated using C:

 import re
 re.findall(r'[\C]', 'CCC')
['C', 'C', 'C']
 re.compile(r'[\C]', 128)
literal 67
_sre.SRE_Pattern object at 0x01FC6E78
 re.compile(r'C', 128)
literal 67
_sre.SRE_Pattern object at 0x01FC6F08

Incorrect behaviour exhibited by A (and by B and 
Z):

 re.findall(r'[\A]', 'AAA')
[]
 re.compile(r'A', 128)
literal 65
_sre.SRE_Pattern object at 0x01FC6F98
 re.compile(r'[\A]', 128)
in
  at at_beginning_string  FAIL 
_sre.SRE_Pattern object at 0x01FDF0B0


Also there is no self-checking at runtime; the switch default has a comment to 
the effect that nothing can be done, so pretend that the unknown opcode matched 
nothing. Zen?

--
messages: 152194
nosy: sjmachin
priority: normal
severity: normal
status: open
title: re pattern r[\A] should work like A but matches nothing. Ditto B and 
Z.
type: behavior
versions: Python 2.7, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2012-01-28 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

This happens because \A, \B and \Z are valid escape sequences[0].
If what you mean is that they shouldn't be recognized as such inside a 
character class, then I can agree with that.
^ and $ are similar to \A and \Z but they are considered as literals inside []. 
 I think the same could also be applied to \b and \B, unless you expect r'[\b]' 
to match the same as r'\b'.

(On an unrelated note, it's preferable to avoid using ints as flag -- using 
re.DEBUG is better)

[0]: http://docs.python.org/library/re.html#regular-expression-syntax

--
nosy: +ezio.melotti, mrabarnett

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2012-01-28 Thread John Machin

John Machin sjmac...@lexicon.net added the comment:

@ezio: Of course the context is inside a character class.

I expect r'[\b]' to act like r'\b' aka r'\x08' aka backspace because (1) that 
is the treatment applied to all other C-like control char escapes (2) the docs 
say so explicitly: Inside a character range, \b represents the backspace 
character, for compatibility with Python’s string literals.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com