Does Python support the Unicode-flavored class-specifications in regular expressions, e.g. \p{L} ? It doesn't work in the following code, any ideas?
----- #! /usr/local/bin/python """ usage: ./uni_read.py file """ import codecs import re text = codecs.open(sys.argv[1], mode='r', encoding='utf-8').read() unicode_property_pattern = re.compile(r"\p{L}") dot_pattern = re.compile(".") letters = unicode_property_pattern.findall(text) characters = dot_pattern.findall(text) print 'var letters =', letters print 'var characters = ', characters ----- The input file, encoded in utf-8 is abc <followed by space, alpha, beta gamma> The output is: var letters = [] var characters = [u'a', u'b', u'c', u' ', u'\u03b1', u'\u03b2', u'\u03b3'] _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor