2009/6/18 Hannes Röst <[email protected]>: > the problem is here: > (r'\bdeutsche(r|n|) Reich\b', r'Deutsche\1 Reich'), > > It seems to be the case that \b does not work with the German eszett, > whereas \< does work in my case. Should this be changed in all cases > where \b is used? Do you have other suggestions?
Hello! >>> import re >>> t = u'Großdeutschen Reich sdfsfasff deutschen Reich' >>> re.findall(r'(\bdeutsche[rn]? Reich\b)', t) [u'deutschen Reich', u'deutschen Reich'] >>> re.findall(r'(?u)\bdeutsche[rn]? Reich\b', t) [u'deutschen Reich'] >>> re.findall(r'\bdeutsche[rn]? Reich\b', t, re.U) [u'deutschen Reich'] In other words, you just have to specify that you want the match to take into account Unicode Locale... (?u) anywhere in the regex, or compile with re.U flag :) Regards, -- Nicolas Dumazet — NicDumZ [ nɪk.d̪ymz ] _______________________________________________ Pywikipedia-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
