Re: Counting Devanagari Aksharas

2017-04-26 Thread Eli Zaretskii via Unicode
> Date: Wed, 26 Apr 2017 07:45:07 +0100 > From: Richard Wordingham via Unicode > > On Wed, 26 Apr 2017 08:48:13 +0300 > Eli Zaretskii via Unicode wrote: > > > > Date: Sun, 23 Apr 2017 22:59:49 +0100 > > > From: Richard Wordingham > > > Cc: Eli Zaretskii > > > > > > If I search for CGJ, highl

Re: Counting Devanagari Aksharas

2017-04-25 Thread Richard Wordingham via Unicode
On Wed, 26 Apr 2017 08:48:13 +0300 Eli Zaretskii via Unicode wrote: > > Date: Sun, 23 Apr 2017 22:59:49 +0100 > > From: Richard Wordingham > > Cc: Eli Zaretskii > > > > If I search for CGJ, highlighting it is frequently supremely > > useless. I want to know where it is; highlighting is merely

Re: Counting Devanagari Aksharas

2017-04-25 Thread Eli Zaretskii via Unicode
> Date: Sun, 23 Apr 2017 22:59:49 +0100 > From: Richard Wordingham > Cc: Eli Zaretskii > > If I search for CGJ, highlighting it is frequently supremely useless. > I want to know where it is; highlighting is merely a tool to find it on > the screen. So I guess this means highlighting is useful a

Re: Go romanize! Re: Counting Devanagari Aksharas

2017-04-25 Thread Naena Guru via Unicode
Quote from below: The word indeed means 'danger' (Pali/Sanskrit _antarāya_). The pronunciation is /ʔontʰalaːi/; the Tai languages that use(d) the Tai Tham script no longer have /r/. The older sequence /tr/ normally became /tʰ/ (except in Lao), but the spelling has not been updated - at least, n

Re: Go romanize! Re: Counting Devanagari Aksharas

2017-04-24 Thread Richard Wordingham via Unicode
On Mon, 24 Apr 2017 20:53:12 +0530 Naena Guru via Unicode wrote: > Quote by Richard: > Unless this implies a spelling reform for many languages, I'd like to > see how this works for the Tai Tham script. I'm not happy with the > Romanisation I use to work round hostile rendering engines. (My > s

Go romanize! Re: Counting Devanagari Aksharas

2017-04-24 Thread Naena Guru via Unicode
Quote by Richard: Unless this implies a spelling reform for many languages, I'd like to see how this works for the Tai Tham script. I'm not happy with the Romanisation I use to work round hostile rendering engines. (My scheme is only documented in variable hack_ss02 in the last script blocks of

Re: Counting Devanagari Aksharas

2017-04-24 Thread Richard Wordingham via Unicode
On Mon, 24 Apr 2017 00:36:26 +0530 Naena Guru via Unicode wrote: > The Unicode approach to Sanskrit and all Indic is flawed. Indic > should not be letter-assembly systems. > > Sanskrit vyaakaraNa (grammar) explains the phonemes as the atoms of > the speech. Each writing system then assigns a sha

Re: Counting Devanagari Aksharas

2017-04-23 Thread Richard Wordingham via Unicode
On Sun, 23 Apr 2017 05:40:29 +0300 Eli Zaretskii via Unicode wrote: > > The cursor moves to the cluster boundary, so there is much less of a > > problem with Emacs. > > But you wanted to highlight only part of the cluster, AFAIU. If I search for CGJ, highlighting it is frequently supremely us

Re: Counting Devanagari Aksharas

2017-04-23 Thread Naena Guru via Unicode
The Unicode approach to Sanskrit and all Indic is flawed. Indic should not be letter-assembly systems. Sanskrit vyaakaraNa (grammar) explains the phonemes as the atoms of the speech. Each writing system then assigns a shape to the phonetically precise phoneme. The most technically and gramma

Re: Counting Devanagari Aksharas

2017-04-22 Thread Asmus Freytag via Unicode
On 4/22/2017 9:25 PM, Manish Goregaokar via Unicode wrote: Backspace in browsers (chrome and firefox) deletes within EGCs too. They delete matras in devanagari, and jamos in hangul. They don't *exactly* work off of code points (e.g. flag emoji gets deleted as a whol

Re: Counting Devanagari Aksharas

2017-04-22 Thread Manish Goregaokar via Unicode
> You cannot even > meaningfully move by single characters in most clusters, because > composing characters generally completely changes how the original > characters looked, so there's nowhere you can display the cursor. Yes, and this is one of the reasons it feels broken in devanagari, you get c

Re: Counting Devanagari Aksharas

2017-04-22 Thread Eli Zaretskii via Unicode
> Date: Sun, 23 Apr 2017 00:51:59 +0100 > Cc: Julian Bradfield > From: Richard Wordingham via Unicode > > On Sat, 22 Apr 2017 21:39:42 +0100 (BST) > Julian Bradfield via Unicode wrote: > > > On 2017-04-22, Eli Zaretskii via Unicode wrote: > > > > I could imagine Emacs decomposing characters

Re: Counting Devanagari Aksharas

2017-04-22 Thread Richard Wordingham via Unicode
On Sat, 22 Apr 2017 21:39:42 +0100 (BST) Julian Bradfield via Unicode wrote: > On 2017-04-22, Eli Zaretskii via Unicode wrote: > > I could imagine Emacs decomposing characters temporarily when only > > part of a cluster matches the search string. Assuming this would > > make sense to users of

Re: Counting Devanagari Aksharas

2017-04-22 Thread Julian Bradfield via Unicode
On 2017-04-22, Eli Zaretskii via Unicode wrote: >> From: Richard Wordingham via Unicode [...] >> I've encountered the problem that, while at least I can search for >> text smaller than a cluster, there's no indication in the window of >> where in the window the text is. > > I could imagine Emacs

Re: Counting Devanagari Aksharas

2017-04-22 Thread Eli Zaretskii via Unicode
> Date: Sat, 22 Apr 2017 17:13:36 +0100 > From: Richard Wordingham via Unicode > > > Movement by grapheme > > cluster is AFAIK the most natural way of moving in complex scripts. > > Evidence? Personal experience? > It's easiest for displaying the cursor. It's the _only_ way of displaying the

Re: Counting Devanagari Aksharas

2017-04-22 Thread Richard Wordingham via Unicode
On Sat, 22 Apr 2017 13:34:32 +0300 Eli Zaretskii via Unicode wrote: > AFAIR, Emacs allows one to _delete_ individual characters, > i.e. Backspace and C-d delete character-by-character, so the problem > shouldn't be so grave for imperfect typists. Deleting forwards by one _character_ certainly ma

Re: Counting Devanagari Aksharas

2017-04-22 Thread Eli Zaretskii via Unicode
> Date: Sat, 22 Apr 2017 11:13:16 +0100 > From: Richard Wordingham via Unicode > > At present these are split into two and three grapheme clusters > respectively, and LibreOffice cursor movement responds accordingly. > (SIGN AA starts a grapheme cluster in several scripts of further > India.) Ho

Re: Counting Devanagari Aksharas

2017-04-22 Thread Richard Wordingham via Unicode
On Fri, 21 Apr 2017 16:27:43 -0700 Manish Goregaokar via Unicode wrote: > > Do Hindi speakers really think of orthographic syllables as > > characters? > > When rendered as a cluster, yes? I've asked around, and folks seem to > insist on coupling it to the rendering. That argues that it's a u

Re: Counting Devanagari Aksharas

2017-04-21 Thread Manish Goregaokar via Unicode
> Do Hindi speakers really think of orthographic syllables as characters? When rendered as a cluster, yes? I've asked around, and folks seem to insist on coupling it to the rendering. Given most fonts render *normal* (common, etc) clusters, I think making them EGCs and looking at nonrendered clust

Re: Counting Devanagari Aksharas

2017-04-21 Thread Richard Wordingham via Unicode
On Thu, 20 Apr 2017 11:17:05 -0700 Manish Goregaokar via Unicode wrote: > On Wed, Apr 19, 2017 at 4:35 PM, Richard Wordingham via Unicode > wrote: > > Is there consensus on how to count aksharas in the Devanagari > > script? The doubts I have relate to a visible halant in > > orthographic sylla

Re: Counting Devanagari Aksharas

2017-04-21 Thread Manish Goregaokar via Unicode
That seems like a relatively niche use case (especially with Vedic Sanskrit) compared to having weird selection for everything else. I'm not convinced. When I use a romanized Devanagari input method (I typically do on my laptop), deleting the whole cluster is necessary anyway for things to work wel

Re: Counting Devanagari Aksharas

2017-04-21 Thread Richard Wordingham via Unicode
On Fri, 21 Apr 2017 00:08:24 -0500 Anshuman Pandey via Unicode wrote: > > On Apr 20, 2017, at 8:19 PM, Richard Wordingham via Unicode > > wrote: > > Now imagine you're > > typing Vedic Sanskrit, with its clusters and pitch indicators. > I tried typing Vedic Sanskrit, and it seems to work:

Re: Counting Devanagari Aksharas

2017-04-20 Thread Anshuman Pandey via Unicode
> On Apr 20, 2017, at 8:19 PM, Richard Wordingham via Unicode > wrote: > > On Thu, 20 Apr 2017 14:14:00 -0700 > Manish Goregaokar via Unicode wrote: > >> On Thu, Apr 20, 2017 at 12:14 PM, Richard Wordingham via Unicode >> wrote: > >>> On Thu, 20 Apr 2017 11:17:05 -0700 >>> Manish Goregaokar

Re: Counting Devanagari Aksharas

2017-04-20 Thread Richard Wordingham via Unicode
On Thu, 20 Apr 2017 14:14:00 -0700 Manish Goregaokar via Unicode wrote: > On Thu, Apr 20, 2017 at 12:14 PM, Richard Wordingham via Unicode > wrote: > > On Thu, 20 Apr 2017 11:17:05 -0700 > > Manish Goregaokar via Unicode wrote: > >> I'm of the opinion that Unicode should start considering dev

Re: Counting Devanagari Aksharas

2017-04-20 Thread Manish Goregaokar via Unicode
I mean, we do the same for Hangul. The main time you need intra-conjunct segmentation in Devanagari is when deleting something you just typed. And backspace usually operates on code points anyway (except for some weird cases like flag emoji, though this isn't uniform across platforms). I don't see

Re: Counting Devanagari Aksharas

2017-04-20 Thread Richard Wordingham via Unicode
On Thu, 20 Apr 2017 11:17:05 -0700 Manish Goregaokar via Unicode wrote: > When given a rendered representation people seem to uniformly count > conjuncts as multiple aksharas if rendered with visible halant, and as > a single akshara if they are rendered conjoined. Now, that's what I expected.

Re: Counting Devanagari Aksharas

2017-04-20 Thread Richard Wordingham via Unicode
On Thu, 20 Apr 2017 15:33:37 +0530 Shriramana Sharma via Unicode wrote: > All I can say is that Tamil script has eschewed most consonant cluster > ligatures/conjoining forms. As for Devanagari, writing श्रीमान्‌को (I > used ZWNJ) i.o. श्रीमान्को is quite possible with existing technology. > The

Re: Counting Devanagari Aksharas

2017-04-20 Thread Manish Goregaokar via Unicode
I don't think there's consensus. When given a rendered representation people seem to uniformly count conjuncts as multiple aksharas if rendered with visible halant, and as a single akshara if they are rendered conjoined. Most fonts for devanagari these days are pretty good at conjoining consonant

Re: Counting Devanagari Aksharas

2017-04-20 Thread Shriramana Sharma via Unicode
Hello Richard. Yes my earlier reply wasn't intended to be offlist. I have near-zero knowledge about non-Indic languages. All I can say is that Tamil script has eschewed most consonant cluster ligatures/conjoining forms. As for Devanagari, writing श्रीमान्‌को (I used ZWNJ) i.o. श्रीमान्को is quite

Re: Counting Devanagari Aksharas

2017-04-20 Thread Richard Wordingham via Unicode
I was offered the following reply: > To my knowledge except in Tamil script vowel less consonants in > written form aren't considered as separate "akshara"s in native > terminology. Word-finally they seem to be being treated as such. To be more precise, a final cluster of one or more consonants