Re: convert unicode characters to visibly similar ascii characters

M.-A. Lemburg Wed, 02 Jul 2008 01:42:09 -0700

On 2008-07-01 20:31, Peter Bulychev wrote:

Hello.


I want to convert unicode character into ascii one.
The method ".encode('ASCII') " can convert only those unicode characters,
which fit into 0..128 range.

But there are still lots of characters beyond this range, which can be
manually converted to some visibly similar ascii characters. For instance,
there are several quotation marks in unicode, which can be converted into
ascii quotation mark.

Can this conversion be performed in automatic manner? After googling I've
only found that there exists Unicode database, which stores human-readable
information on notation of all unicode characters (
ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt). And there also exists
the Python adapter for this database (
http://docs.python.org/lib/module-unicodedata.html). Using this database I
can do something like `if notation.find('QUOTATION')!=-1:\n\treturn "'"`. I
believe there is more elegant way. Am I right?


You could write a codec which translates Unicode into a ASCII
lookalike characters, but AFAIK there is no standard for doing
this.

I guess the best choice is to use the Unicode code point names
as basis. These can be accessed via unicodedata.name(). You can
then create a mapping which can be processed by the character
map codec.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jul 02 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-07-07: EuroPython 2008, Vilnius, Lithuania             4 days to go

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
--
http://mail.python.org/mailman/listinfo/python-list

Re: convert unicode characters to visibly similar ascii characters

Reply via email to