Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks?

wxjmfauth Wed, 17 Oct 2012 08:36:13 -0700

Le mercredi 17 octobre 2012 17:00:46 UTC+2, Dave Angel a écrit :
> On 10/17/2012 10:31 AM, nwaits wrote:
> 
> > I'm very impressed with python's wordlist script for plain text.  Is there 
> > a script for finding words that do NOT have certain diacritic marks, like 
> > acute or grave accents (utf-8), over the vowels?  
> 
> > Thank you.
> 
> 
> 
> if you can construct a list of "illegal" characters, then you can simply
> 
> check each character of the word against the list, and if it succeeds
> 
> for all of the characters, it's a winner.
> 
> 
> 
> If that's not fast enough, you can build a translation table from the
> 
> list of illegal characters, and use translate on each word.  Then it
> 
> becomes a question of checking if the translated word is all zeroes.  
> 
> More setup time, but much faster looping for each word.
> 
> 
> 
> -- 
> 
> 
> 
> DaveA


Lazy way.
Py3.2

>>> import unicodedata
>>> def HasDiacritics(w):
...     w_decomposed = unicodedata.normalize('NFKD', w)
...     return 'no' if len(w) == len(w_decomposed) else 'yes'
...     
>>> HasDiacritics('éléphant')
'yes'
>>> HasDiacritics('elephant')
'no'
>>> HasDiacritics('\N{LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON}')
'yes'
>>> HasDiacritics('U')
'no'
>>>

Should be ok for the CombiningDiacriticalMarks unicode range
(common diacritics)

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks?

Reply via email to