Le mercredi 17 octobre 2012 17:00:46 UTC+2, Dave Angel a écrit : > On 10/17/2012 10:31 AM, nwaits wrote: > > > I'm very impressed with python's wordlist script for plain text. Is there > > a script for finding words that do NOT have certain diacritic marks, like > > acute or grave accents (utf-8), over the vowels? > > > Thank you. > > > > if you can construct a list of "illegal" characters, then you can simply > > check each character of the word against the list, and if it succeeds > > for all of the characters, it's a winner. > > > > If that's not fast enough, you can build a translation table from the > > list of illegal characters, and use translate on each word. Then it > > becomes a question of checking if the translated word is all zeroes. > > More setup time, but much faster looping for each word. > > > > -- > > > > DaveA
Lazy way. Py3.2 >>> import unicodedata >>> def HasDiacritics(w): ... w_decomposed = unicodedata.normalize('NFKD', w) ... return 'no' if len(w) == len(w_decomposed) else 'yes' ... >>> HasDiacritics('éléphant') 'yes' >>> HasDiacritics('elephant') 'no' >>> HasDiacritics('\N{LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON}') 'yes' >>> HasDiacritics('U') 'no' >>> Should be ok for the CombiningDiacriticalMarks unicode range (common diacritics) jmf -- http://mail.python.org/mailman/listinfo/python-list