New submission from Christian Klein: The Python 2.7 re module seems not to agree what to consider a word character:
import re s = u'f\xfc' print re.sub('\W', '*', s, re.UNICODE) print re.findall('\w', s, re.UNICODE) The application of re.sub removes the character u'ΓΌ' which implies it's considered a non word character (\W). But then re.findall shows it as a word character (\w). Python 3.4 and Python 3.5 are correct respectively coherent. (But that's unfortunately not an option for Google App Engine) ---------- components: Regular Expressions messages: 248560 nosy: cklein, ezio.melotti, mrabarnett priority: normal severity: normal status: open title: Incoherent bevavior with umlaut in regular expressions type: behavior versions: Python 2.7 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue24863> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com