Hello, Several posters (including a certain GvR) in the bug tracker (*) have been baffled by an apparent bug where the re.IGNORECASE flag didn't imply case-insensitivity for non-ASCII characters. It turns out that, although the pattern was a string object and although Py3k is supposed to be unicode-friendly, you still need to supply the re.UNICODE flag if you want the re module to use unicode-aware case-insensitive matching.
Wouldn't it be more natural that, at least when the pattern is a str object rather a bytes object, the re.UNICODE be implied by default? (*) http://bugs.python.org/issue2834 Another question in the same vein: is it normal that we can match a bytes object with an str pattern and vice-versa? pat = re.compile('Á', re.IGNORECASE | re.UNICODE) pat.match('á'.encode('latin1')) # gives <_sre.SRE_Match object at 0xb7c66c60> pat = re.compile('Á'.encode('latin1'), re.IGNORECASE | re.UNICODE) pat.match('á') # gives <_sre.SRE_Match object at 0xb7c66c60> Regards Antoine. _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com