Re: Specification of Encoding of Plain Text

Asmus Freytag Tue, 10 Jan 2017 13:17:20 -0800

On 1/10/2017 12:44 PM, Richard Wordingham wrote:

On Tue, 10 Jan 2017 00:06:05 -0800
Asmus Freytag <[email protected]> wrote:

On 1/9/2017 2:24 PM, Richard Wordingham wrote:

I'll take your last point first.

One might hope that the subsection about 'logical order' in TUS 9.0
Section 2.2 Unicode Design Principles would help, but:

1) Section 3 'Conformance' says nothing about logical order; and
2) The subsection about 'logical order' seems to assume that there
exists a common practice; it does not actually place any requirement
on this common practice.

I don't think either of these general sections are intended to
provide the correct or expected ordering of characters for complex
scripts. Any preferred ordering that doesn't result by happenstance
from normalization would presumably be describe in the text of the
scrip section, such as Section 16.4 Khmer, in TUS 9.0.0.

The key word here is 'preferred'.  Your reply, while not completely
clear, confirms my view that Unicode does not *specify* an overall
character ordering for Khmer, despite the section's having a BNF regexp
for Khmer syllables - B{R|C}{S{R}}*{{Z}V}{O}{S}.


You are possibly misreading my use of the word "preferred".

For example, a naive reading of TUS 9.0 Section 16.4 Subsection
"Ordering of Syllable Components" would lead one to believe that the
word _khnyom_ 'I' shall be encoded as <U+1781 KHMER LETTER KHA,
U+17D2 KHMER SIGN COENG, U+1789 KHMER LETTER NYO, U+17BB KHMER VOWEL
SIGN U, U+17C6 KHMER SIGN NIKAHIT>.

Richard,
the group of Khmer experts that developed the recent label generation
rules for root zone domain names considers that ordering the only one
supported,  a specification you find here:
https://www.icann.org/en/system/files/files/proposal-khmer-lgr-15aug16-en.pdf

But as you acknowledge, the specification only covers a strict subset of
legitimate Khmer script text, even of text composed of encoded Khmer
characters.

The advantage of the text I brought to your attention is the way it isformalized and that it was created with local expertise. Thedisadvantage from your perspective is that the scope does not match withyour intended use case.

It excludes some text given in TUS Section 16.4.  Indeed,
Section 7.4 of the proposal to ICANN even excludes the new spelling of
the word ឱ្យ (ooy, give) - <U+17B1 KHMER INDEPENDENT VOWEL QOO TYPE ONE,
U+17D2 KHMER SIGN COENG, U+1799 KHMER LETTER YO>, for U+17B1 is not a
consonant!

I have ignored the logical gaps in your reply; nothing in the *Unicode*
standard prohibits or deprecates the sequence <U+1781, U+17C6, U+17D2,
U+1789, U+17BB>, even though it does not satisfy the regexp I quoted
above.

Unicode clearly doesn't forbid most sequences in complex scripts, evenif they cannot be expected to render properly and otherwise would stumpthe native reader.

However, the descriptions are reasonably detailed to let you find outwhether you are using characters as intended, or not.

So, you are not alone in thinking that the COENG goes between
consonants.

I do not support the heresy that COENG may only occur between
consonants.

Remember, I gave you the scope for that. Your example is well taken, butfrom a different scope, where explicitly accounting for some othersequences is necessary. No disagreement.

A./


I do wonder if the Khmer Generation Panel opened their Pali grammars.
How would they propose to write the accusative singular of nouns in
-i?  The accusative singular of non-neuter nouns ends in -iṁ, which I
would expect to be written <U+17B7 KHMER VOWEL SIGN I, U+17C6 KHMER SIGN
NIKAHIT>, which is what I perceive at the end of a line in the middle
of the second left-hand page at
http://watkhemararatanaram.org/tipitaka/viney_beidok_05b.php .  Do they
expect one to use U+17B9 KHMER VOWEL SIGN Y?  (Thai scholars once had
to resort to such an expedient.)

Richard.

Re: Specification of Encoding of Plain Text

Reply via email to