Re: USE Indic Syllabic Category
On Sat, 23 Feb 2019 14:46:27 +0800 梁海 Liang Hai via Unicode wrote: > USE wasn’t designed to allow such a syllable structure. Tai Tham’s > being supported by USE is kind of an oversight. And although it’s > appropriate to allow conjoined consonants to follow post-base-spacing > vowel signs, There's a quick hack there. As U+1A63 TAI THAM VOWEL SIGN AA and 1A64 U+TAI THAM VOWEL SIGN TALL AA start grapheme clusters, just promote them to BASE. It also solves the problem of tone mark placement. It does postpone the handling of the ligature to after the dissolution of syllable boundaries, which could force unwelcome changes in a Pali-only Tai Tham font, if such exist. At least one font has an extensive set of ligatures for the sequences . I have to handle the ligature after the dissolution because of the syllable boundary in . A quick hack for the likes of Tai Lü ᨻᩭ᩠ᩅ᩻ᩣ /pɔi vaː/ ‘because’ may be more troublesome even if one omits U+1A7B TAI THAM SIGN MAI SAM. You probably won't like it anyway, because a good rendering looks more like the nonsense words /pvɔi paː/ or /pvɔi pvaː/. (I think the cluster /pv/ does not exist in any form in Tai Lü, and that would rule it out.) Richard.
Re: USE Indic Syllabic Category
On Sat, 23 Feb 2019 14:46:27 +0800 梁海 Liang Hai via Unicode wrote: > >>> once the USE acknowledges that subjoined consonants may follow > >>> vowels > >> > >> I expect to update the USE spec to address this soon. > > > > That seems welcome news. I still don't know what the problem with > > supporting them has been. > > USE wasn’t designed to allow such a syllable structure. Tai Tham’s > being supported by USE is kind of an oversight. And although it’s > appropriate to allow conjoined consonants to follow post-base-spacing > vowel signs, it’s not really a trivial debate whether USE should > allow conjoined consonants to non-post-base-spacing (ie, pre-base, > above-base, and below-base) vowel signs—considering the ambiguity. What are your thoughts on the handling of 'medial consonants'? My best surmise is that the Unicode classification is intended for subscript consonants that prototypically occur between a phonetically and orthographically syllable-initial consonant and the possibly implicit vowel. Significantly, clusters of medial consonants can occur. However, I am not sure why they should be treated any differently from subscript consonants. My best hypotheses are that: 1) They can lose any segmental significance in the pronunciation of a word, e.g. being reduced to encoding features, as in Burmese. 2) Their visual positioning in the onset cluster does not relate to the phonetic order; for example, medial RA may be written before the cluster without any anchor in the vertical stack. >From the prototypical behaviour, the USE has deduced the rule that a medial consonant must be followed by a vowel, albeit implicit. An implicit vowel does not count if it is removed by a virama (as opposed to a pure killer). You have suggested that the Indic Syllabic Category should reflect the structure of strings in scripts more closely. Do you agree that this deduction goes beyond the implications of the Unicode categorisation as a medial consonant? Or do you think that the Unicode concept of 'medial consonant' should be changed. My feeling is that I should report to Microsoft that the characterisation of U+1A55 TAI THAM CONSONANT SIGN MEDIAL RA and U+1A56 TAI THAM CONSONANT SIGN MEDIAL LA, both with InSC=Consonant_Medial, as medial consonants, is wrong for the USE. There are three ways that these signs fail to correspond to the USE's model of a medial consonant: 1. The Tai Tham sequences and can act as vowels in Tai Tham languages. 2. The implicit vowel following them can be silenced. Now normally this should not be a problem, for the vowel killers are categorised as 'pure_killer' (U+1A7A) and 'syllable_modifier' (U+1A7C). The potential issue revealed itself when U+1A7A was mistagged as 'halant', implying 'virama'. 3. MEDIAL RA can precede a resonant consonant, as in ᨲᩕ᩠ᨶᩬᨾ /tʰanɔːm/ (MFL Rev 1 p269). Richard.
Re: USE Indic Syllabic Category
On Sat, 23 Feb 2019 14:46:27 +0800 梁海 Liang Hai via Unicode wrote: > >>> once the USE acknowledges that subjoined consonants may follow > >>> vowels > >> > >> I expect to update the USE spec to address this soon. > > > > That seems welcome news. I still don't know what the problem with > > supporting them has been. > > USE wasn’t designed to allow such a syllable structure. Tai Tham’s > being supported by USE is kind of an oversight. And although it’s > appropriate to allow conjoined consonants to follow post-base-spacing > vowel signs, it’s not really a trivial debate whether USE should > allow conjoined consonants to non-post-base-spacing (ie, pre-base, > above-base, and below-base) vowel signs—considering the ambiguity. 1. "The goal of the clustering logic is to enable what is graphically consistent with a given script’s rules, rather than enforcing particular orthographic or linguistic rules. Such considerations should be applied at another layer, such as a spelling checker." - USE Specification. There are very few cases that cannot be resolved by a spell-checker once word boundaries are resolved. Pali and Tai phonology (but Lao is TBC) conspire to keep the numbers down. 2. The UTC membership had this discussion when discussing the proposals on the Unicore list. 3. Ambiguity is often font-dependent with above- and below-base vowels, and with tone marks. Marks above are frequently positioned relative to the phonetically preceding spacing consonant element - , , and are common coda ("sakot") consonants that are spacing. In Northern Thai, is frequently and can be written with the vowel largely to the left of the subscript consonant. Apart from , Northern Thai largely avoids , preferring the minor ambiguity of, for example, being either /huːp/ or /luː paʔ/. (These two forms are a doublet.) 4. They're explicitly noted in the TUS for the Khmer script, and I suspect they're important for Tai languages in the Khmer ('Khom') script. 5. For visual proofing, one can use colour-coding - people are welcome to copy the relevant logic from my Da Lekh Si font. Word processor support for colour distinctions is limited, but it is in place in several browsers. Most of each akshara is in the foreground colour, so it works with syntax highlighting and similar existing uses of colour-coding. 6. The Sanskrit clusters grv- and gvr- are ambiguous in several Sanskrit-capable Indic scripts. (I haven't yet had the chance to study how Sanskrit is written in Tai Tham, though I do know of one inscription.) 7. The ambiguity of and was called out when was allowed as the usual subscript of U+1A37 TAI THAM LETTER BA. 8. The biggest ambiguity issue is the use of for U+1A6C TAI THAM VOWEL SIGN OA BELOW. The USE is powerless to deal with this. I wish someone would let me in on the evidence that they are actually distinct. 9. There is actually a problem with CVC aksharas being wrongly encoded paradoxically because of USE's poor support for Tai Tham. HarfBuzz allows an OpenType font to shape Tai Tham text even if it does not declare support for the script. Such fonts have to do Indic rearrangement themselves, and this is generally done by means of ligatures for . Consequently, a cluster gets encoded as , as there are scores of clusters and five preposed vowels. I know it is possible to do rearrangement properly given access to GSUB; I have a Tai Tham via ASCII mode in my Da Lekh fonts, and I have to do some rearrangement to clean up after the USE. There was a brief, happy period when HarfBuzz's SEA shaping engine was available for Tai Tham, but this was deleted in favour of an implementation of the USE. There are now two bunches of Tai Tham fonts which simply don't work on Microsoft browsers - Graphite fonts and the DIY OpenType Indic rearrangers. Richard.
Re: USE Indic Syllabic Category
>>> once the USE acknowledges that subjoined consonants may follow >>> vowels >> >> I expect to update the USE spec to address this soon. > > That seems welcome news. I still don't know what the problem with > supporting them has been. USE wasn’t designed to allow such a syllable structure. Tai Tham’s being supported by USE is kind of an oversight. And although it’s appropriate to allow conjoined consonants to follow post-base-spacing vowel signs, it’s not really a trivial debate whether USE should allow conjoined consonants to non-post-base-spacing (ie, pre-base, above-base, and below-base) vowel signs—considering the ambiguity. Best, 梁海 Liang Hai https://lianghai.github.io > On Feb 23, 2019, at 09:47, Richard Wordingham via Unicode > wrote: > > On Fri, 22 Feb 2019 22:19:25 + > Andrew Glass wrote: > >> Thank you Richard for pointing out the issue with 0x1A7A >> I've looked into this and found an error in our tooling that has this >> mapped this to Halant. Based on the spec this should be VAbv. I've >> filed a bug. > > Thanks. Will the correction be rolled out to all Microsoft > Windows 10 customers at about the same time? I appreciate that > corporate customers may impose their own extra, internal delays - my > employer is still on Windows 7. > > In the meantime, I've updated my fonts (Da Lekh and Lamphun) to > correct the problem. However, such corrections run the risk of wrongly > deleting dotted circles that come from the backing store, and so are > not Unicode-compliant. The sooner I can remove the corrections, the > better. > >>> Where can I find the InSc properties of characters as overridden >>> for the USE of Windows? >> USE spec includes overrides to ISC and IPC: >> >> https://docs.microsoft.com/en-gb/typography/script-development/use#overrides > > I had the impression there were more overrides than just those. > >>> once the USE acknowledges that subjoined consonants may follow >>> vowels >> I expect to update the USE spec to address this soon. > > That seems welcome news. I still don't know what the problem with > supporting them has been. > > Richard.
Re: USE Indic Syllabic Category
On Fri, 22 Feb 2019 22:19:25 + Andrew Glass wrote: > Thank you Richard for pointing out the issue with 0x1A7A > I've looked into this and found an error in our tooling that has this > mapped this to Halant. Based on the spec this should be VAbv. I've > filed a bug. Thanks. Will the correction be rolled out to all Microsoft Windows 10 customers at about the same time? I appreciate that corporate customers may impose their own extra, internal delays - my employer is still on Windows 7. In the meantime, I've updated my fonts (Da Lekh and Lamphun) to correct the problem. However, such corrections run the risk of wrongly deleting dotted circles that come from the backing store, and so are not Unicode-compliant. The sooner I can remove the corrections, the better. > > Where can I find the InSc properties of characters as overridden > > for the USE of Windows? > USE spec includes overrides to ISC and IPC: > > https://docs.microsoft.com/en-gb/typography/script-development/use#overrides I had the impression there were more overrides than just those. > > once the USE acknowledges that subjoined consonants may follow > > vowels > I expect to update the USE spec to address this soon. That seems welcome news. I still don't know what the problem with supporting them has been. Richard.
Re: USE Indic Syllabic Category
On 2/22/2019 7:29 AM, Richard Wordingham via Unicode wrote: On Fri, 22 Feb 2019 09:07:06 + Richard Wordingham via Unicode wrote: My best hypothesis (not thoroughly tested) is that Windows currently has InSc=Consonant_Killer, but can I look his up as opposed to effectively devising a test suite for USE on Office? That question's rather mangled. It should have said: My best hypothesis (not thoroughly tested) is that Windows currently has InSc=Consonant_Killer, but can where I look this up as opposed to effectively devising a test suite for USE on Windows? FWIW, HarfBuzz currently has VAbv 'vowel above', in accordance with the Unicode 11.0 properties. Richard. "can where I" is perhaps not as much an improvement :) A./
Re: USE Indic Syllabic Category
On Fri, 22 Feb 2019 09:07:06 + Richard Wordingham via Unicode wrote: > My best hypothesis (not thoroughly tested) is that Windows currently > has InSc=Consonant_Killer, but can I look his up as opposed to > effectively devising a test suite for USE on Office? That question's rather mangled. It should have said: My best hypothesis (not thoroughly tested) is that Windows currently has InSc=Consonant_Killer, but can where I look this up as opposed to effectively devising a test suite for USE on Windows? FWIW, HarfBuzz currently has VAbv 'vowel above', in accordance with the Unicode 11.0 properties. Richard.
USE Indic Syllabic Category
Where can I find the InSc properties of characters as overridden for the USE of Windows? I am trying to work out why on MS Edge I am now getting dotted circles before U+1A7A TAI THAM SIGN RA HAAM in all of: ᩆᩢᨠ᩠ᨯᩥ᩺ rank /sak/ , ᨾᩉᩣᩉᩥᨦ᩠ᨣᩩ᩺ giant fennel /ma haː hiŋ/ and ᩆᩣᩈ᩠ᨲᩕ᩺ science /saːt/ ? U+1A7A used to have InSC=Syllable_Modifier, for which these would all work (at the cost of ᩈᩮᩥᩁ᩠᩺ᨷ to serve /sɤːp/ failing), which was then changed to InSC=Pure_Killer, which will work for all of them once the USE acknowledges that subjoined consonants may follow vowels (as in old-fashioned Khmer - see TUS) and that vowels below precede vowels above in Tai Tham (see Lanna/Tai Tham proposals). My best hypothesis (not thoroughly tested) is that Windows currently has InSc=Consonant_Killer, but can I look his up as opposed to effectively devising a test suite for USE on Office? Richard.