Re: USE Indic Syllabic Category

2019-02-24 Thread Richard Wordingham via Unicode
On Sat, 23 Feb 2019 14:46:27 +0800
梁海 Liang Hai via Unicode  wrote:

> USE wasn’t designed to allow such a syllable structure. Tai Tham’s
> being supported by USE is kind of an oversight. And although it’s
> appropriate to allow conjoined consonants to follow post-base-spacing
> vowel signs,

There's a quick hack there.  As U+1A63 TAI THAM VOWEL SIGN AA and
1A64 U+TAI THAM VOWEL SIGN TALL AA start grapheme clusters, just promote
them to BASE.  It also solves the problem of tone mark placement.  It
does postpone the handling of the  ligature to after the
dissolution of syllable boundaries, which could force unwelcome changes
in a Pali-only Tai Tham font, if such exist. At least one font has an
extensive set of ligatures for the sequences .
I have to handle the ligature after the dissolution because of the
syllable boundary in . 

A quick hack for the likes of Tai Lü ᨻᩭ᩠ᩅ᩻ᩣ  /pɔi vaː/ ‘because’ may be more troublesome
even if one omits U+1A7B TAI THAM SIGN MAI SAM.  You probably won't
like it anyway, because a good rendering looks more like the
nonsense words /pvɔi paː/ or /pvɔi pvaː/.  (I think the cluster /pv/
does not exist in any form in Tai Lü, and that would rule it out.)

Richard.



Re: USE Indic Syllabic Category

2019-02-23 Thread Richard Wordingham via Unicode
On Sat, 23 Feb 2019 14:46:27 +0800
梁海 Liang Hai via Unicode  wrote:

> >>> once the USE acknowledges that subjoined consonants may follow
> >>> vowels
> >> 
> >> I expect to update the USE spec to address this soon.  
> > 
> > That seems welcome news.  I still don't know what the problem with
> > supporting them has been.  
> 
> USE wasn’t designed to allow such a syllable structure. Tai Tham’s
> being supported by USE is kind of an oversight. And although it’s
> appropriate to allow conjoined consonants to follow post-base-spacing
> vowel signs, it’s not really a trivial debate whether USE should
> allow conjoined consonants to non-post-base-spacing (ie, pre-base,
> above-base, and below-base) vowel signs—considering the ambiguity.

What are your thoughts on the handling of 'medial consonants'?  My
best surmise is that the Unicode classification is intended for
subscript consonants that prototypically occur between a phonetically
and orthographically syllable-initial consonant and the possibly
implicit vowel.  Significantly, clusters of medial consonants can occur.

However, I am not sure why they should be treated any differently from
subscript consonants.  My best hypotheses are that:

1) They can lose any segmental significance in the pronunciation of a
word, e.g. being reduced to encoding features, as in Burmese.
2) Their visual positioning in the onset cluster does not relate to the
phonetic order; for example, medial RA may be written before the
cluster without any anchor in the vertical stack.

>From the prototypical behaviour, the USE has deduced the rule that a
medial consonant must be followed by a vowel, albeit implicit.  An
implicit vowel does not count if it is removed by a virama (as opposed
to a pure killer).  You have suggested that the Indic Syllabic
Category should reflect the structure of strings in scripts more
closely.  Do you agree that this deduction goes beyond the implications
of the Unicode categorisation as a medial consonant?  Or do you think
that the Unicode concept of 'medial consonant' should be changed. 

My feeling is that I should report to Microsoft that the
characterisation of U+1A55 TAI THAM CONSONANT SIGN MEDIAL RA and U+1A56
TAI THAM CONSONANT SIGN MEDIAL LA, both with InSC=Consonant_Medial, as
medial consonants, is wrong for the USE.

There are three ways that these signs fail to correspond to the USE's
model of a medial consonant:

1. The Tai Tham sequences  and  can act as
vowels in Tai Tham languages.

2. The implicit vowel following them can be silenced.  Now normally
this should not be a problem, for the vowel killers are categorised as
'pure_killer' (U+1A7A) and 'syllable_modifier' (U+1A7C).  The potential
issue revealed itself when U+1A7A was mistagged as 'halant', implying
'virama'.

3. MEDIAL RA can precede a resonant consonant, as in ᨲᩕ᩠ᨶᩬᨾ  /tʰanɔːm/ (MFL Rev 1 p269).

Richard.



Re: USE Indic Syllabic Category

2019-02-23 Thread Richard Wordingham via Unicode
On Sat, 23 Feb 2019 14:46:27 +0800
梁海 Liang Hai via Unicode  wrote:

> >>> once the USE acknowledges that subjoined consonants may follow
> >>> vowels
> >> 
> >> I expect to update the USE spec to address this soon.  
> > 
> > That seems welcome news.  I still don't know what the problem with
> > supporting them has been.  
> 
> USE wasn’t designed to allow such a syllable structure. Tai Tham’s
> being supported by USE is kind of an oversight. And although it’s
> appropriate to allow conjoined consonants to follow post-base-spacing
> vowel signs, it’s not really a trivial debate whether USE should
> allow conjoined consonants to non-post-base-spacing (ie, pre-base,
> above-base, and below-base) vowel signs—considering the ambiguity.

1. "The goal of the clustering logic is to enable what is graphically
consistent with a given script’s rules, rather than enforcing
particular orthographic or linguistic rules. Such considerations should
be applied at another layer, such as a spelling checker." - USE
Specification.

There are very few cases that cannot be resolved by a spell-checker
once word boundaries are resolved.  Pali and Tai phonology (but
Lao is TBC) conspire to keep the numbers down.

2. The UTC membership had this discussion when discussing the proposals
on the Unicore list.

3. Ambiguity is often font-dependent with above- and below-base vowels,
and with tone marks.  Marks above are frequently positioned relative to
the phonetically preceding spacing consonant element - ,
,  and  are common coda ("sakot")
consonants that are spacing.  In Northern Thai,  is
frequently and  can be written with the vowel
largely to the left of the subscript consonant.  Apart from , Northern Thai largely avoids , preferring the minor ambiguity of, for example,  being either /huːp/ or /luː paʔ/.  (These two forms are
a doublet.)

4. They're explicitly noted in the TUS for the Khmer script, and I
suspect they're important for Tai languages in the Khmer ('Khom')
script.

5. For visual proofing, one can use colour-coding - people are welcome
to copy the relevant logic from my Da Lekh Si font.  Word processor
support for colour distinctions is limited, but it is in place in
several browsers.  Most of each akshara is in the foreground colour, so
it works with syntax highlighting and similar existing uses of
colour-coding.

6. The Sanskrit clusters grv- and gvr- are ambiguous in several
Sanskrit-capable Indic scripts.  (I haven't yet had the chance to study
how Sanskrit is written in Tai Tham, though I do know of one
inscription.)

7. The ambiguity of  and  was called out when
 was allowed as the usual subscript of U+1A37 TAI THAM
LETTER BA.

8. The biggest ambiguity issue is the use of  for U+1A6C TAI THAM VOWEL SIGN OA BELOW.  The USE is
powerless to deal with this.  I wish someone would let me in on the
evidence that they are actually distinct.

9. There is actually a problem with CVC aksharas being wrongly encoded
paradoxically because of USE's poor support for Tai Tham.

HarfBuzz allows an OpenType font to shape Tai Tham text even if it does
not declare support for the script.  Such fonts have to do Indic
rearrangement themselves, and this is generally done by means of
ligatures for .  Consequently, a cluster
 gets encoded as , as there are scores of clusters and five preposed vowels.  I know
it is possible to do rearrangement properly given access to GSUB; I
have a Tai Tham via ASCII mode in my Da Lekh fonts, and I have to do
some rearrangement to clean up after the USE.

There was a brief, happy period when HarfBuzz's SEA shaping engine was
available for Tai Tham, but this was deleted in favour of an
implementation of the USE.  There are now two bunches of Tai Tham
fonts which simply don't work on Microsoft browsers - Graphite fonts
and the DIY OpenType Indic rearrangers.

Richard.



Re: USE Indic Syllabic Category

2019-02-22 Thread 梁海 Liang Hai via Unicode
>>> once the USE acknowledges that subjoined consonants may follow
>>> vowels  
>> 
>> I expect to update the USE spec to address this soon.
> 
> That seems welcome news.  I still don't know what the problem with
> supporting them has been.

USE wasn’t designed to allow such a syllable structure. Tai Tham’s being 
supported by USE is kind of an oversight. And although it’s appropriate to 
allow conjoined consonants to follow post-base-spacing vowel signs, it’s not 
really a trivial debate whether USE should allow conjoined consonants to 
non-post-base-spacing (ie, pre-base, above-base, and below-base) vowel 
signs—considering the ambiguity.

Best,
梁海 Liang Hai
https://lianghai.github.io

> On Feb 23, 2019, at 09:47, Richard Wordingham via Unicode 
>  wrote:
> 
> On Fri, 22 Feb 2019 22:19:25 +
> Andrew Glass  wrote:
> 
>> Thank you Richard for pointing out the issue with 0x1A7A
>> I've looked into this and found an error in our tooling that has this
>> mapped this to Halant. Based on the spec this should be VAbv. I've
>> filed a bug.
> 
> Thanks.  Will the correction be rolled out to all Microsoft
> Windows 10 customers at about the same time?  I appreciate that
> corporate customers may impose their own extra, internal delays - my
> employer is still on Windows 7.
> 
> In the meantime, I've updated my fonts (Da Lekh and Lamphun) to
> correct the problem.  However, such corrections run the risk of wrongly
> deleting dotted circles that come from the backing store, and so are
> not Unicode-compliant.  The sooner I can remove the corrections, the
> better.
> 
>>> Where can I find the InSc properties of characters as overridden
>>> for the USE of Windows?  
>>  USE spec includes overrides to ISC and IPC:
>>  
>> https://docs.microsoft.com/en-gb/typography/script-development/use#overrides
> 
> I had the impression there were more overrides than just those.
> 
>>> once the USE acknowledges that subjoined consonants may follow
>>> vowels  
>>  I expect to update the USE spec to address this soon.
> 
> That seems welcome news.  I still don't know what the problem with
> supporting them has been.
> 
> Richard.



Re: USE Indic Syllabic Category

2019-02-22 Thread Richard Wordingham via Unicode
On Fri, 22 Feb 2019 22:19:25 +
Andrew Glass  wrote:

> Thank you Richard for pointing out the issue with 0x1A7A
> I've looked into this and found an error in our tooling that has this
> mapped this to Halant. Based on the spec this should be VAbv. I've
> filed a bug.

Thanks.  Will the correction be rolled out to all Microsoft
Windows 10 customers at about the same time?  I appreciate that
corporate customers may impose their own extra, internal delays - my
employer is still on Windows 7.

In the meantime, I've updated my fonts (Da Lekh and Lamphun) to
correct the problem.  However, such corrections run the risk of wrongly
deleting dotted circles that come from the backing store, and so are
not Unicode-compliant.  The sooner I can remove the corrections, the
better.

> > Where can I find the InSc properties of characters as overridden
> > for the USE of Windows?  
>   USE spec includes overrides to ISC and IPC:
>   
> https://docs.microsoft.com/en-gb/typography/script-development/use#overrides

I had the impression there were more overrides than just those.

> > once the USE acknowledges that subjoined consonants may follow
> > vowels  
>   I expect to update the USE spec to address this soon.

That seems welcome news.  I still don't know what the problem with
supporting them has been.

Richard.


Re: USE Indic Syllabic Category

2019-02-22 Thread Asmus Freytag via Unicode

  
  
On 2/22/2019 7:29 AM, Richard
  Wordingham via Unicode wrote:


  On Fri, 22 Feb 2019 09:07:06 +
Richard Wordingham via Unicode  wrote:


  
My best hypothesis (not thoroughly tested) is that Windows currently
has InSc=Consonant_Killer, but can I look his up as opposed to
effectively devising a test suite for USE on Office?

  
  
That question's rather mangled.  It should have said:

My best hypothesis (not thoroughly tested) is that Windows currently
has InSc=Consonant_Killer, but can where I look this up as opposed to
effectively devising a test suite for USE on Windows?

FWIW, HarfBuzz currently has VAbv 'vowel above', in accordance with the
Unicode 11.0 properties.

Richard.



"can where I"  is perhaps not as much an
improvement  :)
A./

  



Re: USE Indic Syllabic Category

2019-02-22 Thread Richard Wordingham via Unicode
On Fri, 22 Feb 2019 09:07:06 +
Richard Wordingham via Unicode  wrote:

> My best hypothesis (not thoroughly tested) is that Windows currently
> has InSc=Consonant_Killer, but can I look his up as opposed to
> effectively devising a test suite for USE on Office?

That question's rather mangled.  It should have said:

My best hypothesis (not thoroughly tested) is that Windows currently
has InSc=Consonant_Killer, but can where I look this up as opposed to
effectively devising a test suite for USE on Windows?

FWIW, HarfBuzz currently has VAbv 'vowel above', in accordance with the
Unicode 11.0 properties.

Richard.


USE Indic Syllabic Category

2019-02-22 Thread Richard Wordingham via Unicode
Where can I find the InSc properties of characters as overridden for
the USE of Windows?

I am trying to work out why on MS Edge I am now getting dotted circles
before U+1A7A TAI THAM SIGN RA HAAM in all of:

ᩆᩢᨠ᩠ᨯᩥ᩺ rank /sak/ ,

ᨾᩉᩣᩉᩥᨦ᩠ᨣᩩ᩺ giant fennel /ma haː hiŋ/ 

and

ᩆᩣᩈ᩠ᨲᩕ᩺ science /saːt/ ?

U+1A7A used to have InSC=Syllable_Modifier, for which these would all
work (at the cost of ᩈᩮᩥᩁ᩠᩺ᨷ to serve /sɤːp/  failing), which was then changed to
InSC=Pure_Killer, which will work for all of them once the USE
acknowledges that subjoined consonants may follow vowels (as in
old-fashioned Khmer - see TUS) and that vowels below precede vowels
above in Tai Tham (see Lanna/Tai Tham proposals).

My best hypothesis (not thoroughly tested) is that Windows currently has
InSc=Consonant_Killer, but can I look his up as opposed to
effectively devising a test suite for USE on Office?

Richard.