Re: What is the time frame for USE shapers to provide support for CV+C ?

2019-05-14 Thread Richard Wordingham via Unicode
On Tue, 14 May 2019 03:08:04 +0100
Richard Wordingham via Unicode  wrote:

> Together,
> these call for (Sk B)* to be replaced by ().

Correction:
Together, these call for (Sk B)* to be replaced by ()*.

Richard.


Re: What is the time frame for USE shapers to provide support for CV+C ?

2019-05-13 Thread Richard Wordingham via Unicode
On Tue, 14 May 2019 00:58:07 +
Andrew Glass via Unicode  wrote:

> Here is the essence of the initial changes needed to support CV+C.
> Open to feedback.
> 
> 
>   *   Create new SAKOT class
> SAKOT (Sk) based on UISC = Invisible_Stacker
>   *   Reduced HALANT class
> Now only HALANT (H) based on UISC = Virama
>   *   Updated Standard cluster mode
> 
> [< R | CS >] < B | GB > [VS] (CMAbv)* (CMBlw)* (< < H | Sk > B | SUB
> > [VS] (CMAbv)* (CMBlw)*)* [MPre] [MAbv] [MBlw] [MPst] (VPre)*
> > (VAbv)* (VBlw)* (VPst)* (VMPre)* (VMAbv)* (VMBlw)* (VMPst)* (Sk B)*
> > (FAbv)* (FBlw)* (FPst)* [FM]

This comes a lot closer to supporting Tai Tham monosyllabic clusters.

Although this shouldn't affect Tai Tham, some of those medials need to
be made repeatable; I belief this has already been done in HarfBuzz.

I trust you'll be reclassifying U+1A55 TAI THAM CONSONANT SIGN MEDIAL RA
and U+1A56 TAI THAM CONSONANT SIGN MEDIAL LA into the category SUB so
that we can write about bananas forever (ᨠᩖ᩠ᩅ᩠᩶ᨿᨲᩕ᩠ᩃᩬᨯ):

 /kluai/ 'banana'

 /tʰalɔːt/ 'for ever'

The issues here are that WA in a medial rôle is indistinguishable from
a coda ('sakot') consonant and that MEDIAL RA can act as a consonant
aspirator.

Unfortunately, we didn't define a consonant HIGH RATTHA with a
canonical decomposition to .  The problem is that 'HIGH RATTHA', widely seen as an alternative
form of HIGH RATHA, can act as a subscript coda consonant.  There are
also a couple of words in the Northern Thai Dictionary of Palm-Leaf
Manuscripts where MEDIAL LA acts as a coda consonant.  Together,
these call for (Sk B)* to be replaced by ().

This next question does not, I believe, affect HarfBuzz.  Will NFC
code render as well as unnormalised code?  In the first example above,
 normalises to , which
does not match any portion of the regular expression.

Richard.



RE: What is the time frame for USE shapers to provide support for CV+C ?

2019-05-13 Thread Andrew Glass via Unicode
Here is the essence of the initial changes needed to support CV+C. Open to 
feedback.


  *   Create new SAKOT class
SAKOT (Sk) based on UISC = Invisible_Stacker
  *   Reduced HALANT class
Now only HALANT (H) based on UISC = Virama
  *   Updated Standard cluster mode

[< R | CS >] < B | GB > [VS] (CMAbv)* (CMBlw)* (< < H | Sk > B | SUB > [VS] 
(CMAbv)* (CMBlw)*)* [MPre] [MAbv] [MBlw] [MPst] (VPre)* (VAbv)* (VBlw)* (VPst)* 
(VMPre)* (VMAbv)* (VMBlw)* (VMPst)* (Sk B)* (FAbv)* (FBlw)* (FPst)* [FM]


The only required component of a standard cluster is a BASE or BASE_OTHER. A 
cluster may optionally begin with a REPH or CONS_WITH_STACKER. A BASE or 
BASE_OTHER may be followed immediately by a VARIATION_SELECTOR and/or multiple 
CONS_MOD characters in the order CONS_MOD_ABOVE CONS_MOD_BELOW. Multiple 
sequences of a HALANT BASE or SAKOT BASE with optional VARIATION_SELECTOR or 
optional CONS_MOD can occur. The sequence can continue with zero or one 
CONS_MED for each cardinal position (Pre, Above, Below, Post); zero to many 
VOWEL characters in each cardinal position; zero to many VOWEL_MODs in each 
cardinal position; zero to many sequences of SAKOT BASE; zero to many 
CONS_FINALs in each of Above, Below, and Post; and lastly, an optional 
FINAL_MOD.



  *   Updated Halant-terminated cluster
[< R | CS >] < B | GB > [VS] (CMAbv)* (CMBlw)* (< < H | Sk > B | SUB > [VS] 
(CMAbv)* (CMBlw)*)* < H | Sk >



This is similar to the Standard cluster but terminates in a final HALANT or 
SAKOT after a BASE, BASE_OTHER, or CONS_MOD. When such a HALANT or SAKOT it 
will form a cluster. When any character other than a BASE or BASE_OTHER follows 
the HALANT or SAKOT there will be a cluster break between the HALANT or SAKOT 
and the following character. Multiple sequences of a HALANT BASE or SAKOT BASE 
with optional VARIATION_SELECTOR or optional CONS_MOD can occur. A CONS_SUBJ is 
equivalent to the sequence HALANT BASE.



  *   New Sakot-terminated cluster

[< R | CS >] < B | GB > [VS] (CMAbv)* (CMBlw)* (< < H | Sk > B | SUB > [VS] 
(CMAbv)* (CMBlw)*)*

[MPre] [MAbv] [MBlw] [MPst]

(VPre)* (VAbv)* (VBlw)* (VPst)*

(VMPre)* (VMAbv)* (VMBlw)* (VMPst)*

(Sk B [VS] (CMAbv)* (CMBlw)*)* Sk



This is similar to the Standard cluster but terminates in a final SAKOT after a 
VOWEL or VOWEL_MOD. When such a SAKOT follows a VOWEL or VOWEL_MOD it will form 
a cluster. When any character other than a BASE or BASE_OTHER follows this 
SAKOT there will be a cluster break between the SAKOT and the following 
character. Multiple sequences of a SAKOT BASE with optional VARIATION_SELECTOR 
or optional CONS_MOD can occur. A CONS_SUBJ is equivalent to the sequence 
HALANT BASE.

This would allow a consonant to follow a vowel when joined with a Sakot. It 
would support multiple final consonants. It would not support polysyllabic 
chaining of CV+CV+CV etc.

Cheers,

Andrew


From: Behdad Esfahbod 
Sent: 10 May 2019 11:32
To: Ed Trager 
Cc: Andrew Glass ; Unicode Mailing List 

Subject: Re: What is the time frame for USE shapers to provide support for CV+C 
?

I'm open to doing that if there's consensus on how it should be done.

On Thu, May 9, 2019 at 8:55 AM Ed Trager 
mailto:ed.tra...@gmail.com>> wrote:
Hi, Andrew and Behdad,

Prompted by a conversation I had with Liang Hai yesterday, I am just curious to 
get some idea about the following:

(1) When can we anticipate that the USE spec will be updated to provide support 
for subjoined consonants below vowels (as required for TAI THAM) ?

(2) Once the USE spec is updated, how much lag time can we expect until 
Microsoft actually releases an implementation with said support for CV+C ?

(3a) And the related question —for Behdad and the HarfBuzz development group— 
is when can we expect to see CV+C support (at least for TAI THAM) in HarfBuzz ?

(3b) Would the HarfBuzz team consider providing CV+C support for TAI THAM even 
before the USE spec gets updated, so that we could test things out ? * **

---
* PLEASE AND THANKYOU?

** A good use case is the Tai Tham word U+1A27 U+1A6A U+1A60 U+1A37 , 
transcribed to Central Thai script as จูบ, (to kiss). Currently, people are 
writing this as U+1A27 U+1A60 U+1A37 U+1A6A ("จบู") which violates the 
"phonetic ordering" but is the current workaround because USE is still broken 
for TAI THAM.

REFERENCE DOCUMENT:
http://www.unicode.org/L2/L2018/18332-tai-tham-ad-hoc-report.pdf




--
behdad

Re: What is the time frame for USE shapers to provide support for CV+C ?

2019-05-09 Thread Richard Wordingham via Unicode
On Thu, 9 May 2019 11:55:23 -0400
Ed Trager via Unicode  wrote:
 
> ** A good use case is the Tai Tham word U+1A27 U+1A6A U+1A60 U+1A37 ,
> transcribed to Central Thai script as จูบ, (*to kiss*). Currently,
> people are writing this as U+1A27 U+1A60 U+1A37 U+1A6A ("จบู") which
> violates the "phonetic ordering" but is the current workaround
> because USE is still broken for TAI THAM.
> 
> REFERENCE DOCUMENT:
> http://www.unicode.org/L2/L2018/18332-tai-tham-ad-hoc-report.pdf

How is this a good test case?  The 6th preliminary recommendation
reads, "To represent a cluster, regardless of the phonetic order CCV or
CVC, a consonant sign should always be encoded before the vowel sign,
unless the vowel sign has inline advance and is apparently followed by
the consonant sign".  If this recommendation is adopted, then the
spelling "U+1A27 U+1A6A U+1A60 U+1A37" will be  wrong.

Now, SIGN U and SIGN UU before subscript BA, HIGH PA and LOW YA aren't
always written as though they followed the subscript consonants in
phonetic order.  Sometimes the vowel sign is written in the bottom left
of the syllable.  Presumably we'll need 3 or 4 new signs:

TAI THAM UNAMBIGUOUS UB

TAI THAM UNAMBIGUOUS UUB

TAI THAM UNAMBIGUOUS UY

TAI THAM UNAMBIGUOUS UUY (?)

I'm not sure that the fourth one can occur.

An example of the contrast is shown in the attached files luynam.png,
with first orthographic syllable , and
yukya.png, with the first orthographic syllable . 

I wonder how we'd be supposed to encode ᩉᩖᩩ᩠᩶ᨿ (currently  'to crawl'?  The simplest
way would be to encode it as , which currently encodes
the unlikely ᩉᩖ᩠ᨿᩩ᩶. Will good fonts be expected to move the vowel left
and down from the subscript LOW YA to the MEDIAL LA?  Or will we need to
encode it with *TAI THAM UNAMBIGUOUS UY?

Richard.