On 4/3/2017 12:12 AM, Gerriet M. Denkmann wrote:
The Combining Class is used for normalisation of strings.
Normalisation of strings is important for filenames in filesystems.

The same issues apply to network identifiers.

As far as I know, a Thai consonant (Lo, Other_Letter) can have several 
Nonspacing_Marks.
This cluster of nonspacing marks can contain at most one top/bottom vowel and 
at most one tone/other mark.
There is no syntactically meaning in the order of these nonspacing marks.

So: All top/bottom vowels should have Combining Class 103, all tone/other marks 
have Combining Class 107.

Is there a reason for having top vowels or other-marks with Combining Class 0, 
Not_Reordered?

With the current choice of Combining Class both consonant + mark + top vowel 
and consonant + top vowel + mark are normalised, so that one can have two files 
with these (identically looking, but different) names, which is rather 
confusing.

It is not possible to construct a set of secure network identifiers based on simply
a) ensuring the string is in NFC
b) otherwise allowing all of the Thai characters (insofar as the they are PVALID in IDNA 2008 [RFC5892]).

Considerable attention to allowable contexts is required. There is a group in Thailand working on this, but their results have not yet been made public.

Similar work for Khmer and Lao can be found here:
https://www.icann.org/en/system/files/files/proposal-khmer-lgr-15aug16-en.pdf
https://www.icann.org/en/system/files/files/proposal-lao-lgr-31jan17-en.pdf

A./

Here a list of all nonspacing marks in the Thai script:

top vowels (Combining Class 0, Not_Reordered):  ← this seems to be wrong; 
should be 103
THAI CHARACTER MAI HAN-AKAT     ั
THAI CHARACTER SARA I   ิ
THAI CHARACTER SARA II  ี
THAI CHARACTER SARA UE  ึ
THAI CHARACTER SARA UEE ื

bottom vowels (Combining Class 103):
THAI CHARACTER SARA U   ุ
THAI CHARACTER SARA UU  ู

tone-marks (Combining Class 107):
THAI CHARACTER MAI EK   ่
THAI CHARACTER MAI THO  ้
THAI CHARACTER MAI TRI  ๊
THAI CHARACTER MAI CHATTAWA     ๋

other-marks (Combining Class 0, Not_Reordered): ← this seems to be wrong, 
should be 107
THAI CHARACTER MAITAIKHU        ็
THAI CHARACTER THANTHAKHAT      ์
THAI CHARACTER NIKHAHIT ํ
THAI CHARACTER YAMAKKAN ๎

other-marks (Combining Class 9, Virama)
THAI CHARACTER PHINTHU  ฺ

Gerriet.




Reply via email to