> On 4 Apr 2017, at 00:00,Asmus Freytag <[email protected]> wrote:
> 
> It is not possible to construct a set of secure network identifiers based on 
> simply
> a) ensuring the string is in NFC
> b) otherwise allowing all of the Thai characters (insofar as the they are 
> PVALID in IDNA 2008 [RFC5892]).
> 
> Considerable attention to allowable contexts is required. There is a group in 
> Thailand working on this, but their results have not yet been made public.

Maybe this: Proposal for the Thai Script Root Zone Label Generation Rulesets 
<https://www.icann.org/en/system/files/files/proposal-thai-lgr-15dec16-en.pdf>

But the rules for Root Zone Labels are (rightly) much more restricted than what 
I want:

Any two strings which look (almost?) identical should be normalised into some 
canonical form.
Reason: not to have identical looking filenames in a filesystem.
With the current rules of normalisation there could be 8 different filenames 
all looking identical to “กินครึ่งทิ้งครึ่ง”.

E.g. :
- both NIKHAHIT + Sara Aa  and Sara Am should be normalised into the same 
string (whatever this is)
- both top-vowel + tone-mark and  tone-mark + top-vowel should be normalised 
into the same string (whatever this is).
etc.

If, as Richard Wordingham wrote: "Unicode combining classes cannot be changed.  
All that can be done is
to enforce the order of characters in normalised text.” then the Unicode 
Normalisation algorithms should be updated.


Kind regards,

Gerriet.


Reply via email to