New submission from Tom Christiansen <tchr...@perl.com>:

You cannot use Python's lib re for handling Unicode regular expressions because 
it violates the standard set out for the same in UTS#18 on Unicode Regular 
Expressions in RL1.2a on compatibility properties.  What \w is allowed to match 
is clearly explained there, but Python has its own idea. Because it is in clear 
violation of the standard, it is misleading and wrong for Python to claim that 
the re.UNICODE flag makes \w and friends match Unicode.  Here are the failed 
test cases when the attached file is run under v3.2; there are further failures 
when run under v2.7.

FAIL lib re    found non alphanumeric string cafeฬ
FAIL lib re    found non alphanumeric string โ“€
FAIL lib re    found non alphanumeric string อ…
FAIL lib re    found non alphanumeric string ึฐ
FAIL lib re    found non alphanumeric string ๐Ÿ˜
FAIL lib re    found non alphanumeric string ๐
FAIL lib re    found non alphanumeric string ๐”˜๐”ซ๐”ฆ๐” ๐”ฌ๐”ก๐”ข
FAIL lib re    found non alphanumeric string ๐”๐ฏ๐‘…๐จ๐‘‰๐ฏ๐ป
FAIL lib re    found non alphanumeric string connectorโ€ฟpunctuation
FAIL lib re    found non alphanumeric string แพบอ…_ฮฃฯ„ฮฟ_ฮ”ฮนฮฌฮฟฮปฮฟ
FAIL lib re    found non alphanumeric string ๐Œฐ๐„๐„๐Œฐโ€ฟ๐Œฟ๐Œฝ๐ƒ๐Œฐ๐‚โ€ฟ๐Œธ๐Œฟโ€ฟ๐Œน๐Œฝโ€ฟ๐Œท๐Œน๐Œผ๐Œน๐Œฝ๐Œฐ๐Œผ
FAIL lib re    found all alphanumeric string ยนยฒยณ
FAIL lib re    found all alphanumeric string โ‚โ‚‚โ‚ƒ
FAIL lib re    found all alphanumeric string ยผยฝยพ
FAIL lib re    found all alphanumeric string โ‘ถ

Note that Matthew Barnett's regex lib for Python handles all of these cases in 
comformance with The Unicode Standard.

----------
components: Regular Expressions
files: alnum.python
messages: 141920
nosy: tchrist
priority: normal
severity: normal
status: open
title: python lib re uses obsolete sense of \w in full violation of UTS#18 
RL1.2a
type: behavior
versions: Python 2.7
Added file: http://bugs.python.org/file22881/alnum.python

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue12731>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to