Re: Specification of Encoding of Plain Text

2017-01-10 Thread Asmus Freytag
These kinds of regexes are being developed in various contexts. For example, there's a group developing regexes for Indic scripts for use with CSS. That effort focuses on the syllable, not least because concepts like "first-letter" used in CSS are not

Re: Superscript and Subscript Characters in General Use

2017-01-10 Thread Richard Wordingham
On Tue, 10 Jan 2017 11:03:24 + Alastair Houghton wrote: > Does anyone besides Marcel have any input on that idea? Is it worth > writing a proposal to add SUPERSCRIPT and SUBSCRIPT? To give some > examples: > > S^{té} > > U+0053 LATIN CAPITAL LETTER S >

Re: Specification of Encoding of Plain Text

2017-01-10 Thread Asmus Freytag
On 1/10/2017 12:44 PM, Richard Wordingham wrote: On Tue, 10 Jan 2017 00:06:05 -0800 Asmus Freytag wrote: On 1/9/2017 2:24 PM, Richard Wordingham wrote: I'll take your last point first. One might hope that the subsection about 'logical order' in TUS 9.0 Section 2.2

Re: Specification of Encoding of Plain Text

2017-01-10 Thread Richard Wordingham
On Tue, 10 Jan 2017 00:06:05 -0800 Asmus Freytag wrote: > On 1/9/2017 2:24 PM, Richard Wordingham wrote: I'll take your last point first. >> One might hope that the subsection about 'logical order' in TUS 9.0 >> Section 2.2 Unicode Design Principles would help, but: >>

Re: Specification of Encoding of Plain Text

2017-01-10 Thread Richard Wordingham
On Tue, 10 Jan 2017 13:12:47 -0800 Asmus Freytag wrote: > Unicode clearly doesn't forbid most sequences in complex scripts, > even if they cannot be expected to render properly and otherwise > would stump the native reader. Is this expectation based on sequence enforcement

Re: Specification of Encoding of Plain Text

2017-01-10 Thread Asmus Freytag
On 1/10/2017 2:54 PM, Richard Wordingham wrote: On Tue, 10 Jan 2017 13:12:47 -0800 Asmus Freytag wrote: Unicode clearly doesn't forbid most sequences in complex scripts, even if they cannot be expected to render properly and otherwise would stump the native reader. Is

Re: Superscript and Subscript Characters in General Use

2017-01-10 Thread Marcel Schneider
On Mon, 9 Jan 2017 14:34:17 -0800, Asmus Freytag wrote: […] > Just get over it […] We are facing a strong user demand since early standards. Actually I cannot. Sorry. Thank you however for all of your feedback. On Tue, 10 Jan 2017 11:03:24 +, Alastair Houghton wrote: […] > […] I think

Re: Specification of Encoding of Plain Text

2017-01-10 Thread Asmus Freytag
On 1/9/2017 2:24 PM, Richard Wordingham wrote: Where, if anywhere, is the encoding of plain text specified? I am particularly concerned with the arrangement of the code sequences for non-spacing abstract characters once one has determined an encoding for the

Re: Superscript and Subscript Characters in General Use

2017-01-10 Thread Frédéric Grosshans
Le 10/01/2017 à 12:03, Alastair Houghton a écrit : That’s part of it, but I think also that the thread is increasingly verbose and hard to follow. I still think that the idea of adding U+ SUPERSCRIPT and U+ SUBSCRIPT might be worth contemplating; it would seem to provide a good answer

Re: Specification of Encoding of Plain Text

2017-01-10 Thread Richard Wordingham
On Tue, 10 Jan 2017 10:11:41 +0100 Mark Davis ☕️ wrote: > What I really wish we had would be a machine readable set of regexes > for each complex script (and for each language-script combination > that is different than the default for that script). What would the status of

Re: Superscript and Subscript Characters in General Use

2017-01-10 Thread Alastair Houghton
On 9 Jan 2017, at 22:34, Asmus Freytag wrote: > > On 1/9/2017 1:39 PM, Marcel Schneider wrote: >> Iʼm saddened to have fallen into a monologue. Thus Iʼll quickly debrief >> the arguments opposed so far, to check whether Iʼm missing some point >> > There's a good reason

Re: Specification of Encoding of Plain Text

2017-01-10 Thread Mark Davis ☕️
What I really wish we had would be a machine readable set of regexes for each complex script (and for each language-script combination that is different than the default for that script). Such a regex R could be used for determining the well-formed ordering of code points within words. The regex