[issue21081] missing vietnamese codec TCVN 5712:1993 in Python

Antti Haapala Fri, 21 Oct 2016 14:02:57 -0700

Antti Haapala added the comment:

Ah there was something that I overlooked before - the VN1 and VN2 both have 
combining accents too. If I read correctly, the main letter should precede the 
combining character, just as in Unicode; VN3 seems to lack combining characters 
altogether.


Thus, for simple text conversion from VN* to Unicode, VN1 should be enough, but 
some VN2/VN3 control/application specific codes might show up as accented 
capital letters.

---

The following script rips the table from iconv:

    import subprocess
    mapping = subprocess.run('iconv -f TCVN -t UTF-8'.split(), 
                             input=bytes(range(256)), 
                             stdout=subprocess.PIPE).stdout.decode()

There were several aliases but all of them seemed to produce identical output. 
Output matches the VN1 from the tables.

And the luatvn.net additionally *did* have a copyable VN1 - UCS2 table

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue21081>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue21081] missing vietnamese codec TCVN 5712:1993 in Python

Reply via email to