New submission from Tom Christiansen <tchr...@perl.com>: You cannot use Python's lib re for handling Unicode regular expressions because it violates the standard set out for the same in UTS#18 on Unicode Regular Expressions in RL1.2a on compatibility properties. What \w is allowed to match is clearly explained there, but Python has its own idea. Because it is in clear violation of the standard, it is misleading and wrong for Python to claim that the re.UNICODE flag makes \w and friends match Unicode. Here are the failed test cases when the attached file is run under v3.2; there are further failures when run under v2.7.
FAIL lib re found non alphanumeric string cafeฬ FAIL lib re found non alphanumeric string โ FAIL lib re found non alphanumeric string อ FAIL lib re found non alphanumeric string ึฐ FAIL lib re found non alphanumeric string ๐ FAIL lib re found non alphanumeric string ๐ FAIL lib re found non alphanumeric string ๐๐ซ๐ฆ๐ ๐ฌ๐ก๐ข FAIL lib re found non alphanumeric string ๐๐ฏ๐ ๐จ๐๐ฏ๐ป FAIL lib re found non alphanumeric string connectorโฟpunctuation FAIL lib re found non alphanumeric string แพบอ _ฮฃฯฮฟ_ฮฮนฮฌฮฟฮปฮฟ FAIL lib re found non alphanumeric string ๐ฐ๐๐๐ฐโฟ๐ฟ๐ฝ๐๐ฐ๐โฟ๐ธ๐ฟโฟ๐น๐ฝโฟ๐ท๐น๐ผ๐น๐ฝ๐ฐ๐ผ FAIL lib re found all alphanumeric string ยนยฒยณ FAIL lib re found all alphanumeric string โโโ FAIL lib re found all alphanumeric string ยผยฝยพ FAIL lib re found all alphanumeric string โถ Note that Matthew Barnett's regex lib for Python handles all of these cases in comformance with The Unicode Standard. ---------- components: Regular Expressions files: alnum.python messages: 141920 nosy: tchrist priority: normal severity: normal status: open title: python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a type: behavior versions: Python 2.7 Added file: http://bugs.python.org/file22881/alnum.python _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue12731> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com