Vlastimil Brom wrote:
Vlastimil Brom <vlastimil.b...@gmail.com> added the comment:
I just tested the fix for unicode tracebacks and found some possibly weird
results (not sure how/whether it should be fixed, as these inputs are indeed
rather artificial...).
(win XPp SP3 Czech, Python 2.6.4)
Using the cmd console, the output is fine (for the characters it can accept and
display)
regex.findall(ur"\p{InBasicLatinĚ}", u"aé")
Traceback (most recent call last):
...
File "C:\Python26\lib\regex.py", line 1244, in _parse_property
raise error("undefined property name '%s'" % name)
regex.error: undefined property name 'InBasicLatinĚ'
(same result for other distorted "proprety names" containing e.g.
ěščřžýáíéúůßäëiöüîô ...
However, in Idle the output differs depending on the characters present
regex.findall(ur"\p{InBasicLatinÉ}", u"ab c")
yields the expected
...
File "C:\Python26\lib\regex.py", line 1244, in _parse_property
raise error("undefined property name '%s'" % name)
error: undefined property name 'InBasicLatinÉ'
but
regex.findall(ur"\p{InBasicLatinĚ}", u"ab c")
Traceback (most recent call last):
...
File "C:\Python26\lib\regex.py", line 1244, in _parse_property
raise error("undefined property name '%s'" % name)
File "C:\Python26\lib\regex.py", line 167, in __init__
message = message.encode(sys.stdout.encoding)
File "C:\Python26\lib\encodings\cp1250.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character u'\xcc' in position 37:
character maps to <undefined>
which might be surprising, as cp1250 should be able to encode "Ě", maybe there
is some intermediate ascii step?
using the wxpython pyShell I get its specific encoding error:
regex.findall(ur"\p{InBasicLatinÉ}", u"ab c")
Traceback (most recent call last):
...
File "C:\Python26\lib\regex.py", line 1102, in _parse_escape
return _parse_property(source, info, in_set, ch)
File "C:\Python26\lib\regex.py", line 1244, in _parse_property
raise error("undefined property name '%s'" % name)
File "C:\Python26\lib\regex.py", line 167, in __init__
message = message.encode(sys.stdout.encoding)
AttributeError: PseudoFileOut instance has no attribute 'encoding'
(the same for \p{InBasicLatinĚ} etc.)
Maybe it shouldn't show the property name at all. That would avoid the
problem.
In python 3.1 in Idle, all of these exceptions are displayed correctly, also in
other scripts or with special characters.
Maybe in python 2.x e.g. repr(...) of the unicode error messages could be used
in order to avoid these problems, but I don't know, what the conventions are in
these cases.
Another issue I found here (unrelated to tracebacks) are backslashes or
punctuation (except the handled -_) in the property names, which just lead to
failed mathces and no exceptions about unknown property names
regex.findall(u"\p{InBasic.Latin}", u"ab c")
[]
In the re module a malformed pattern is sometimes treated as a literal:
>>> re.match(r"a{1,2", r"a{1,2").group()
'a{1,2'
which is what I'm trying to replicate, as far as possible.
Which characters should it accept when parsing the property name, even
if it subsequently rejects the name? I don't want it to accept every
character until it sees the closing '}'. I currently include
alphanumeric, whitespace, '&', '_' and '-'. '.' might be a reasonable
addition.
I was also surprised by the added pos/endpos parameters, as I used flags as a
non-keyword third parameter for the re functions in my code (probably my fault
...)
re.findall(pattern, string, flags=0)
regex.findall(pattern, string, pos=None, endpos=None, flags=0, overlapped=False)
(is there a specific reason for this order, or could it be changed to maintain
compatibility with the current re module?)
Oops! I'll fix that.
I hope, at least some of these remarks make some sense;
thanks for the continued work on this module!
All constructive remarks are welcome! :-)
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com