Vlastimil Brom <[email protected]> added the comment:
I just noticed a cornercase with the newly introduced grapheme matcher \X, if
this is used in the character set:
>>> regex.findall("\X", "abc")
['a', 'b', 'c']
>>> regex.findall("[\X]", "abc")
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "regex.pyc", line 218, in findall
File "regex.pyc", line 1435, in _compile
File "regex.pyc", line 2351, in optimise
File "regex.pyc", line 2705, in optimise
File "regex.pyc", line 2798, in optimise
File "regex.pyc", line 2268, in __hash__
AttributeError: '_Sequence' object has no attribute '_key'
It obviously doesn't make much sense to use this universal literal in the
character class (the same with "." in its metacharacter role) and also
http://www.regular-expressions.info/refunicode.html doesn't mention this
possibility; but the error message might probably be more descriptive, or the
pattern might match "X" or "\" and "\X" (?)
I was originally thinking about the possibility to combine the positive and
negative character classes, where e.g. \X would be a kind of base; I am not
aware of any re engine supporting this, but I eventually found an unicode
guidelines for regular expressions, which also covers this:
http://unicode.org/reports/tr18/#Subtraction_and_Intersection
It also surprises a bit, that these are all included in
Basic Unicode Support: Level 1; (even with arbitrary unions, intersections,
differences ...) it suggests, that there is probably no implementation
available (AFAIK) - even on this basic level, according to this guideline.
Among other features on this level, the section
http://unicode.org/reports/tr18/#Supplementary_Characters
seems useful, especially the handling of the characters beyond \uffff, also in
the form of surrogate pairs as single characters.
This might be useful on the narrow python builds, but it is possible, that
there would be be an incompatibility with the handling of these data in
"narrow" python itself.
Just some suggestions or rather remarks, as you already implemented many
advanced features and are also considering some different approaches ...:-)
vbr
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue2636>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com