New submission from Christian Klein:

The Python 2.7 re module seems not to agree what to consider a word character:

import re
s = u'f\xfc'
print re.sub('\W', '*', s, re.UNICODE)
print re.findall('\w', s, re.UNICODE)

The application of re.sub removes the character u'ΓΌ' which implies it's 
considered a non word character (\W).
But then re.findall shows it as a word character (\w).

Python 3.4 and Python 3.5 are correct respectively coherent.
(But that's unfortunately not an option for Google App Engine)

----------
components: Regular Expressions
messages: 248560
nosy: cklein, ezio.melotti, mrabarnett
priority: normal
severity: normal
status: open
title: Incoherent bevavior with umlaut in regular expressions
type: behavior
versions: Python 2.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue24863>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to