New submission from Frédéric Grosshans-André <frederic.grossh...@gmail.com>:

Currently (python 3.8.6, unidata_version 12.1.0) unicodedata.decomposition 
outputs an empty string for hangul syllable (codepoints in the AC00..D7A3 
range) while the decomposition is not empty: it is always two characters 
(either a LV syllable and a T Jamo or a L jamo and a V jamo). This 
decomposition is dedicible algorithmically (se §3.12 of Unicode Standard). A 
python version of the algorithm is below (I don’t know C, so I can’t propose a 
patch). 

For each hangul syllable hs, I have used unicodedata.noramize to check that the 
NFC of the decomposition is indeed hs, that the decomposition is two codepoints 
long, that the NFD of both hs and the decompotsition coincide 

def hangulsyllabledecomposition(c):
    if not 0xAC00 <= ord(c) <= 0xD7A3 : raise ValueError('only Hangul syllables 
allowed')
    dLV, T = divmod(ord(c) - 0xAC00, 28)
    if T!=0 : #it is a LVT syllable, decomposed into LV:=dLV*19 and T 
        return f'{0xAC00+dLV*28:04X} {0x11A7+T:04X}'
    else : #it is a LVT syllable, decomposed into L , V
        L, V = divmod(dLV,21)
        return f'{0x1100+L:04X} {0x1161+V:04X}'
    # Constants used:
    # ==============
    # 0xAC00 : first syllable == 1st LV syllable 
    #                            NB: there is one LV syllable every 28 
codepoints
    # 0xD7A3 : last Hangul syllable
    # 0x1100 : first L jamo
    # 0x1161 : first V jamo
    # 0x11A7 : one before the 1st T jamo (0x1148), since T=0 means no trailing
    #
    # (all number below restricted for modern jamos where this algorithm is 
relevant)
    # 19 : Number of L jamos (not used here)
    # 21 : Number of V jamos
    # 28 : Number of T jamos plus one (since no T jamo for LV syllable)

----------
components: Unicode
messages: 391715
nosy: ezio.melotti, frederic.grosshans, vstinner
priority: normal
severity: normal
status: open
title: Add hangul syllables to unicodedata.decomposititon
versions: Python 3.8

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue43925>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to