On Mon, 3 Apr 2017 14:12:51 +0700 "Gerriet M. Denkmann" <[email protected]> wrote:
> The Combining Class is used for normalisation of strings. > Normalisation of strings is important for filenames in filesystems. > > As far as I know, a Thai consonant (Lo, Other_Letter) can have > several Nonspacing_Marks. This cluster of nonspacing marks can > contain at most one top/bottom vowel and at most one tone/other mark. > There is no syntactically meaning in the order of these nonspacing > marks. You're confusing the modern Thai language with the Thai script. It seems that the Lao-style usage of NIKHAHIT as a vowel is known from older Thai writing, and when used this way it could of course take a tone mark. It also seems that the pressure to have both MAITAIKHU and a tone mark on a consonant has been accepted for at least one minority language. > So: All top/bottom vowels should have Combining Class 103, all > tone/other marks have Combining Class 107. > Is there a reason for having top vowels or other-marks with Combining > Class 0, Not_Reordered? It does one make one wonder if someone hated Thais. It would have been a lot simpler, and have worked better, if the combining classes for Latin diacritics had been used. As it is, one common combination of vowel below and mark above was catered for - SARA U/UU with tone mark. The system doesn't even cater for SARA U + THANTHAKHAT, as in พันธุ์ทิพย์ 'Phanthip'. The use of values peculiar to Thai (103 and 107) does not help when minority languages use Latin diacritics, such as U+0331 COMBINING MACRON BELOW and U+0303 COMBINING TILDE for Pattani Malay. The viramas that were recognised were given combining class 9; YAMAKKAN and THANTHAKHAT were overlooked. One of the looming problem is that several languages use a combination of PHINTHU and SARA I - both orders are used, though they are not canonically equivalent. Richard.

