Coloured Characters (was: 0027, 02BC, 2019, or a new character?)

2018-02-21 Thread Richard Wordingham via Unicode
On Wed, 21 Feb 2018 16:28:14 +0100 Philippe Verdy via Unicode wrote: > I even hope that there will be a setting in all browsers, OS'es, > mobiles, and apps to refuse any colorful rendering, and just render > them as monochromatic symbols. In summary, COMPLETETY DISABLE the > colorful extensions o

Re: metric for block coverage

2018-02-20 Thread Richard Wordingham via Unicode
On Tue, 20 Feb 2018 15:13:16 + "Dreiheller, Albrecht via Unicode" wrote: > Could someone please supply an example (web link ...) for usage of > danda / double danda in Tamil? Thanks, Albrecht Take your pick from http://www.prapatti.com/slokas/slokasbyname.html . Do they meet your requirement

Re: metric for block coverage

2018-02-19 Thread Richard Wordingham via Unicode
On Mon, 19 Feb 2018 20:02:28 +0100 Philippe Verdy via Unicode wrote: > This pair of punctuation should have been considered since long as > common punctuations (independantly of their assigned names), i.e. > assigned the script property "Comn" and not "Deva". I don't see why > they could not be u

Re: Why so much emoji nonsense?

2018-02-18 Thread Richard Wordingham via Unicode
On Sat, 17 Feb 2018 22:31:12 -0800 James Kass via Unicode wrote: > It's true that added features can make for a better presentation. > Removing the special features shouldn't alter the message. I think I've encountered the use of italics in novels for sotto voce or asides. > The Unicode Standar

Re: Unicode of Death 2.0

2018-02-18 Thread Richard Wordingham via Unicode
On Sun, 18 Feb 2018 14:13:22 +0100 Philippe Verdy via Unicode wrote: > But any operation in OpenType that requires reordering requires a > glyphs buffer. This could even apply to Latin if Microsoft really > intends to support normalization (i.e. canonical equivalences) in its > own USE engine (fo

Re: metric for block coverage

2018-02-18 Thread Richard Wordingham via Unicode
On Sun, 18 Feb 2018 13:05:29 +0100 Adam Borowski via Unicode wrote: > On Sun, Feb 18, 2018 at 02:14:46AM -0800, James Kass wrote: > > You probably already know that basic script coverage information is > > stored internally in OpenType fonts in the OS/2 table. > > > > https://docs.microsoft.com

Re: IDC's versus Egyptian format controls

2018-02-17 Thread Richard Wordingham via Unicode
On Fri, 16 Feb 2018 18:05:41 -0800 James Kass via Unicode wrote: > Richard Wordingham wrote: > > > One can argue that once the compound ideograph have been encoded, > > the IDS should no longer be interpreted. > > Wouldn't that break existing data? If this sort of thing were done at > OS or

Re: IDC's versus Egyptian format controls

2018-02-16 Thread Richard Wordingham via Unicode
On Fri, 16 Feb 2018 15:25:22 -0800 James Kass via Unicode wrote: > Some people studying Han characters use the IDCs to illustrate the > ideographs and their components for various purposes. For example: > > U-0002A8B8 𪢸 ⿰土土 > U-0002A8B9 𪢹 ⿰土凡 > U-0002A8BA 𪢺 ⿱夂土 > U-0002A8BB 𪢻 ⿰土亡 > U-0002A8BC 𪢼

Re: IDC's versus Egyptian format controls

2018-02-16 Thread Richard Wordingham via Unicode
On Fri, 16 Feb 2018 11:10:29 -0800 Ken Whistler via Unicode wrote: > On 2/16/2018 11:00 AM, Asmus Freytag via Unicode wrote: > > On 2/16/2018 8:00 AM, Richard Wordingham via Unicode wrote: > >> That doesn't square well with, "An implementation *may* render a > &

Re: IDC's versus Egyptian format controls

2018-02-16 Thread Richard Wordingham via Unicode
On Fri, 16 Feb 2018 08:22:23 -0800 Ken Whistler via Unicode wrote: > On 2/16/2018 8:00 AM, Richard Wordingham via Unicode wrote: > > > A more portable solution for ideographs is to render an Ideographic > > Description Sequences (IDS) as approximations to the characters they

Re: Why so much emoji nonsense?

2018-02-16 Thread Richard Wordingham via Unicode
On Fri, 16 Feb 2018 10:57:57 + Phake Nick via Unicode wrote: > 2. Actually, the problem is not just limited to emoji. Many > Ideographic characters (Chinese, Japanese, etc) are adding to the > unicode each years, while at the current rate there are still many > rooms in Unicode standard to co

Re: Origin of Alphasyllabaries (was: Why so much emoji nonsense?)

2018-02-16 Thread Richard Wordingham via Unicode
On Fri, 16 Feb 2018 11:42:51 +0100 Philippe Verdy via Unicode wrote: > I said the opposite: the alphabets, abjads, abugidas and today's full > syllabaries derive from early simplified syllabaries,... In the Old World, alphabets and abugidas derive from abjads, which do not derive from syllabarie

Origin of Alphasyllabaries (was: Why so much emoji nonsense?)

2018-02-15 Thread Richard Wordingham via Unicode
On Wed, 14 Feb 2018 21:49:57 +0100 Philippe Verdy via Unicode wrote: > The concept of vowels as distinctive letters came later, even the > letter A was initially a representation of a glottal stop consonnant, > sometimes mute, only written to indicate a word that did not start by > a consonnant i

Re: Why so much emoji nonsense?

2018-02-15 Thread Richard Wordingham via Unicode
On Wed, 14 Feb 2018 17:49:05 -0800 James Kass via Unicode wrote: > I've personally exchanged text data with others using the PUA for both > Klingon and Ewellic. [winks] But wasn't that using a supplementary standard, the ConScript Unicode Registry? Richard.

Re: Why so much emoji nonsense? - Proscription

2018-02-15 Thread Richard Wordingham via Unicode
On Thu, 15 Feb 2018 21:38:19 + Shawn Steele via Unicode wrote: > I realize "I'd've" isn't > "right", Where did that proscription come from? Is it perhaps a perversion of the proscription of "I'd of"? Richard.

Re: What is U+0E46 THAI CHARACTER MAIYAMOK?

2018-02-08 Thread Richard Wordingham via Unicode
On Wed, 7 Feb 2018 20:16:21 -0800 James Kass via Unicode wrote: > Is there a contrasting use where this mark is not used with words? > Maybe numbers? The only other use I've seen is quotation of the mark - putting it in parentheses seems quite common. Richard.

Re: What is U+0E46 THAI CHARACTER MAIYAMOK?

2018-02-08 Thread Richard Wordingham via Unicode
On Wed, 7 Feb 2018 22:02:28 -0800 Asmus Freytag via Unicode wrote: > On 2/7/2018 9:23 PM, Theppitak Karoonboonyanan via Unicode wrote: > An apparent way to do it properly is to use NBSP before > MAIYAMOK and a normal space after, and not to include > any leading space in the glyph, but it seems i

What is U+0E46 THAI CHARACTER MAIYAMOK?

2018-02-07 Thread Richard Wordingham via Unicode
I am having trouble identifying just what is represented by ๆ U+0E46 THAI CHARACTER MAIYAMOK. My problem is that the grammatical texts that I have state that when the Thai punctuation mark mai yamok (ไม้ยมก) is used with words, it is flanked by spaces, a position reiterated by the Thai Wikipedia e

Re: Internationalised Computer Science Exercises

2018-02-05 Thread Richard Wordingham via Unicode
On Thu, 1 Feb 2018 19:20:04 + Richard Wordingham via Unicode wrote: > A regular trace expression of the form > > [:ccc=1:][:ccc=2:]…[:ccc=n:] > > seems to require 2^n states in your scheme. As I effectively only > apply the regex to NFD input strings, I use fewer state

Re: Internationalised Computer Science Exercises - Correction

2018-02-01 Thread Richard Wordingham via Unicode
On Thu, 1 Feb 2018 01:38:58 + Richard Wordingham via Unicode wrote: > I believe the concurrent star of a language A is (|A|)*, where > > |A| = {x ∊ A : {x}* is a regular language} > > (The definition works for the trace of fully decomposed Unicode > character strin

Re: Internationalised Computer Science Exercises

2018-02-01 Thread Richard Wordingham via Unicode
On Thu, 1 Feb 2018 08:03:31 +0100 Philippe Verdy via Unicode wrote: > 2018-02-01 2:38 GMT+01:00 Richard Wordingham via Unicode < > unicode@unicode.org>: >> On Wed, 31 Jan 2018 19:45:56 +0100 >> Philippe Verdy via Unicode wrote: >>> 2018-01-29 21:53 GMT+01:

Re: Internationalised Computer Science Exercises

2018-01-31 Thread Richard Wordingham via Unicode
On Wed, 31 Jan 2018 19:45:56 +0100 Philippe Verdy via Unicode wrote: > 2018-01-29 21:53 GMT+01:00 Richard Wordingham via Unicode < > unicode@unicode.org>: > > On Mon, 29 Jan 2018 14:15:04 +0100 > > was meant to be an example of a > > searched string. For e

Re: Internationalised Computer Science Exercises

2018-01-29 Thread Richard Wordingham via Unicode
On Mon, 29 Jan 2018 14:15:04 +0100 Philippe Verdy via Unicode wrote: > No since the begining we were talking about matching strings that are > canonically equivalent within regexps. So that searching for a regexp > containing precombined characters or decomposed characters would find > them indep

Re: Internationalised Computer Science Exercises

2018-01-29 Thread Richard Wordingham via Unicode
On Mon, 29 Jan 2018 07:16:04 +0100 Philippe Verdy via Unicode wrote: > 2018-01-28 23:44 GMT+01:00 Richard Wordingham via Unicode < > unicode@unicode.org>: > > In the search you have in mind, the converted regex for use with NFD > > strings is actually

Re: Internationalised Computer Science Exercises

2018-01-28 Thread Richard Wordingham via Unicode
On Sun, 28 Jan 2018 20:29:28 +0100 Philippe Verdy via Unicode wrote: > 2018-01-28 5:12 GMT+01:00 Richard Wordingham via Unicode < > unicode@unicode.org>: > > > On Sat, 27 Jan 2018 14:13:40 -0800The theory > > of regular expressions (though you may not think th

Re: 0027, 02BC, 2019, or a new character?

2018-01-27 Thread Richard Wordingham via Unicode
On Sat, 27 Jan 2018 22:54:57 +0100 (CET) Marcel Schneider via Unicode wrote: > The US-Intl is so weird “you canʼt just leave it on all the time” as > reported in: > > http://www.unicode.org/mail-arch/unicode-ml/Archives-Old/UML017/0558.html I did (except when I was using a totally different wri

Re: Internationalised Computer Science Exercises

2018-01-27 Thread Richard Wordingham via Unicode
On Sat, 27 Jan 2018 14:13:40 -0800 Shervin Afshar wrote: > On Mon, Jan 22, 2018 at 2:08 PM, Richard Wordingham via Unicode < > unicode@unicode.org> wrote: > > On Mon, 22 Jan 2018 at 16:39:57, Andre Schappo via Unicode < > > unicode@unicode.org> wrote:

Re: 0027, 02BC, 2019, or a new character?

2018-01-26 Thread Richard Wordingham via Unicode
On Fri, 26 Jan 2018 09:08:51 + Andre Schappo via Unicode wrote: > Ah! Yes😀 That is a battle I gave up a long time ago. The database > here can only handle ASCII. I have stopped trying to get the systems > people here to convert the database to UTF-8. Some systems (or admins) have been totall

Re: 0027, 02BC, 2019, or a new character?

2018-01-24 Thread Richard Wordingham via Unicode
On Thu, 25 Jan 2018 07:59:11 +0530 Shriramana Sharma via Unicode wrote: > IMO it's hardly clear that that is or in fact *what* is meant by a > standard keyboard. It meeely seems to me loose political speak to > make it appear as if they are trying to make things simpler for the > people. >From w

Re: 0027, 02BC, 2019, or a new character?

2018-01-23 Thread Richard Wordingham via Unicode
On Tue, 23 Jan 2018 11:51:42 -0700 Doug Ewell via Unicode wrote: > An explicitly stated goal of the new orthography was to enable typing > Kazakh on a "standard keyboard," meaning an English-language one. > Nazarbayev may ultimately be persuaded to embrace ASCII digraphs, > which also meet this g

Re: 0027, 02BC, 2019, or a new character?

2018-01-23 Thread Richard Wordingham via Unicode
On Wed, 24 Jan 2018 03:22:37 +0800 Phake Nick via Unicode wrote: > >I found the Windows 'US International' keyboard layout highly > >intuitive for accented Latin-1 characters. > How common is the US International keyboard in real life..? I thought it was two copies per new Windows PC - one for

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2018-01-22 Thread Richard Wordingham via Unicode
On Sun, 21 Jan 2018 22:34:12 -0800 Mark Davis ☕️ via Unicode wrote: > I was looking the feedback in http://www.unicode.org/review/pri355/, > and didn't see yours there. Could you please file your feedback > there? (Nothing on this list is tracked by the committee...) This is the submission I hav

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2018-01-22 Thread Richard Wordingham via Unicode
On Sun, 21 Jan 2018 22:34:12 -0800 Mark Davis ☕️ via Unicode wrote: > The ZWJ Virama sequence is already provided for by the combination of > GB9 & GB9c. But not the ZWNJ. If we want to handle that, it would > mean the addition of something like: > > GB9d: × (ZWNJ ViramaExtend* Virama) I don't

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2018-01-22 Thread Richard Wordingham via Unicode
On Sun, 21 Jan 2018 22:34:12 -0800 Mark Davis ☕️ via Unicode wrote: > FYI, I'm thinking now that the change should be: > > GB9c: (Virama | ZWJ ) × LinkingConsonant > => > GB9c: (Virama ViramaExtend* | ZWJ ) × LinkingConsonant > > where ViramaExtend = [Extend - Virama - \p{ccc=0}] > (This i

Re: 0027, 02BC, 2019, or a new character?

2018-01-22 Thread Richard Wordingham via Unicode
On Mon, 22 Jan 2018 11:35:16 +0800 Phake Nick via Unicode wrote: > There > are language-dependent keyboards for French or German with special > keys or deadkeys that help input these umlauts, but they are language > dependent and it is not possible for e.g. a regular American user > using Windows

Re: Internationalised Computer Science Exercises

2018-01-22 Thread Richard Wordingham via Unicode
On Mon, 22 Jan 2018 18:55:16 +0100 Frédéric Grosshans via Unicode wrote: > A simple challenge is to write a function which localize numbers in a > script having decimal digits or parse them (i.e. which have > characters with property Numeric_Type=Decimal, as explained in §4.6 > of the Unicode 10

Re: Internationalised Computer Science Exercises

2018-01-22 Thread Richard Wordingham via Unicode
On Mon, 22 Jan 2018 16:39:57 + Andre Schappo via Unicode wrote: > By way of example, one programming challenge I set to students a > couple of weeks ago involves diacritics. Please see > jsfiddle.net/coas/wda45gLp Did any of them come up with the idea of

Re: 0027, 02BC, 2019, or a new character?

2018-01-21 Thread Richard Wordingham via Unicode
On Sun, 21 Jan 2018 13:49:46 +0100 Philippe Verdy via Unicode wrote: > But there's NO standard keyboard in Kazakhstan with the Latin > alphabet. Those you'll find are cyrillic keyboards with a way to type > basic Latin. Or keyboards made for other countries. I believe we're talking about physica

Re: Emoji for major planets at least?

2018-01-19 Thread Richard Wordingham via Unicode
On Fri, 19 Jan 2018 02:12:04 +0100 Pierpaolo Bernardi via Unicode wrote: > On Fri, Jan 19, 2018 at 1:19 AM, Aleksey Tulinov via Unicode > wrote: > > Perhaps we all shall stop being ironical to each other, calm down, > > sit and discuss how to encode 3D animated emojies (animojies) in > > Unicode

Re: 0027, 02BC, 2019, or a new character?

2018-01-16 Thread Richard Wordingham via Unicode
On Mon, 15 Jan 2018 23:40:15 -0800 James Kass via Unicode wrote: > On a side note, wouldn't most of the "standard keyboards" currently in > Kazakhstan be labelled in Cyrillic anyway? They're probably already labelled in Cyrillic *and* printable ASCII (US QWERTY). Using the Cyrillic labels for no

Re: 0027, 02BC, 2019, or a new character?

2018-01-16 Thread Richard Wordingham via Unicode
On Mon, 15 Jan 2018 20:16:21 -0800 James Kass via Unicode wrote: > It will probably be the ASCII apostrophe. The stated intent favors > the apostrophe over diacritics or special characters to ensure that > the language can be input to computers with standard keyboards. Typing U+0027 into a word

Re: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10)

2018-01-02 Thread Richard Wordingham via Unicode
On Tue, 2 Jan 2018 01:21:37 -0800 Asmus Freytag via Unicode wrote: > On 1/1/2018 6:52 AM, Richard Wordingham via Unicode wrote: > > Generally yes, but I'm not sure that they'd be inappropriate for > > Egyptian hieroglyphs showing human beings. The choice of >

Re: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10)

2018-01-01 Thread Richard Wordingham via Unicode
On Mon, 1 Jan 2018 13:24:29 +0530 Manish Goregaokar via Unicode wrote: > sounds very much like a > degenerate case to me. Generally yes, but I'm not sure that they'd be inappropriate for Egyptian hieroglyphs showing human beings. The choice of determinative can convey unpronounceable semantic

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-31 Thread Richard Wordingham via Unicode
On Fri, 22 Dec 2017 17:44:39 +0200 Eli Zaretskii via Unicode wrote: > > Date: Fri, 22 Dec 2017 15:36:35 + > > From: Richard Wordingham via Unicode > > However, it seems > > that one has to modify the source code of Emacs to be able to edit > > in the middle of

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-22 Thread Richard Wordingham via Unicode
On Fri, 22 Dec 2017 17:44:39 +0200 Eli Zaretskii via Unicode wrote: > You can always delete a codepoint at a given position in Emacs, > specifying the position by its number, but there are no user-level > commands to conveniently allow doing that in the middle of a grapheme > cluster. > > It was

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-22 Thread Richard Wordingham via Unicode
On Thu, 21 Dec 2017 22:04:37 -0800 Manish Goregaokar via Unicode wrote: > > When deleting by backspace, the usual practice is to delete one > > Unicode > character for each key press. > > This seems to depend on the operating system and program involved. For > example, on OSX any native text i

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-22 Thread Richard Wordingham via Unicode
On Fri, 22 Dec 2017 09:27:15 +0200 Eli Zaretskii via Unicode wrote: > > Date: Thu, 21 Dec 2017 22:04:37 -0800 > > Cc: Unicode Public > > From: Manish Goregaokar via Unicode > > > > However, Firefox deletes by code point. > > As does Emacs, btw. And deleting in that fashion from the right i

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-21 Thread Richard Wordingham via Unicode
On Thu, 21 Dec 2017 17:55:33 +0900 "Martin J. Dürst via Unicode" wrote: > On 2017/12/15 07:40, Richard Wordingham via Unicode wrote: > > On Mon, 11 Dec 2017 21:45:23 + > > Cibu Johny (സിബു) wrote: > >> For example see the poster with word ഉസ്താദ് broke

Re: Word_Break for Hieroglyphs

2017-12-20 Thread Richard Wordingham via Unicode
On Mon, 18 Dec 2017 15:15:11 +0100 Serge Rosmorduc via Unicode wrote: > Hence, you have things like (like 5-6) : : the word ẖsy « small », > is cut between the two lines. The phonetic part is line 5, and the > bird determinative is alone on line 5, above the preposition « m », > which is itself

Re: Word_Break for Hieroglyphs

2017-12-16 Thread Richard Wordingham via Unicode
On Thu, 14 Dec 2017 15:53:13 +0100 Mark Davis ☕️ via Unicode wrote: > On Thu, Dec 14, 2017 at 3:22 PM, Michael Everson > wrote: > > NO. Clusters cannot be broken up just anywhere. > Does that mean that ancient inscriptions would leave gaps at the end > of lines in order to not break a cluster

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-14 Thread Richard Wordingham via Unicode
On Mon, 11 Dec 2017 21:45:23 + Cibu Johny (സിബു) wrote: > I am assuming the purpose of the grapheme cluster definition is to be > used line spacing, vertical writing or cursor movement. Without > defining the purpose, it is hard for me to say if a ruleset is valid > or not. Assuming that pur

Re: Word_Break for Hieroglyphs

2017-12-14 Thread Richard Wordingham via Unicode
On Thu, 14 Dec 2017 18:11:33 + Andrew Glass via Unicode wrote: > We had some discussion on the sidelines of > the August UTC meeting at which time it became clear that more work > is needed as current property values are not entirely correct. > Currently, my Hieroglyphic energies are focused

Word_Break for Hieroglyphs

2017-12-14 Thread Richard Wordingham via Unicode
Is there any valid reason for Egyptian hieroglyphs to have Word_Break=ALetter rather than Complex_Context? So far as I am aware, hieroglyphs lack visible word breaks in both inscriptions and in modern transcriptions. Richard.

Atomicity of Grapheme Clusters

2017-12-13 Thread Richard Wordingham via Unicode
I have been reviewing UAX#29 Unicode Text Segmentation because I have a feeling we will be trying to do too much with the concept of grapheme clusters, even with tailoring, when we extend it to include whole aksharas. What is the meaning of "Word boundaries, line boundaries, and sentence boundarie

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-11 Thread Richard Wordingham via Unicode
On Mon, 11 Dec 2017 21:45:23 + Cibu Johny (സിബു) wrote: > I am assuming the purpose of the grapheme cluster definition is to be > used line spacing, vertical writing or cursor movement. Without > defining the purpose, it is hard for me to say if a ruleset is valid > or not. That is a very fa

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-11 Thread Richard Wordingham via Unicode
don't work (or need > more research before #29 is finalized in May), it is fairly > straightforward to restrict the rule changes by modifying > http://www.unicode.org/reports/tr29/proposed.html#Virama to either > exclude particular scripts or include only particular scripts. > > Ma

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-11 Thread Richard Wordingham via Unicode
On Mon, 11 Dec 2017 08:59:20 +0100 Mark Davis ☕️ via Unicode wrote: > The proposed rules do not distinguish the different visual forms that > a sequence of characters surrounding a virama can have, such as > >1. an explicit virama, or >2. a half-form is visible, or >3. a ligature is

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-11 Thread Richard Wordingham via Unicode
On Sun, 10 Dec 2017 21:14:18 -0800 Manish Goregaokar via Unicode wrote: > > GB9c: (Virama | ZWJ ) × Extend* LinkingConsonant > > You can also explicitly request ligatureification with a ZWJ, so > perhaps this rule should be something like > > (Virama ZWJ? | ZWJ) x Extend* LinkingConsonant >

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-09 Thread Richard Wordingham via Unicode
On Sat, 9 Dec 2017 16:16:44 +0100 Mark Davis ☕️ via Unicode wrote: > 1. You make a good point about the GB9c. It should probably instead be > something like: > > GB9c: (Virama | ZWJ ) × Extend* LinkingConsonant > > > Extend is a broader than necessary, and there are a few items that > have c

Re: Aquaφοβία

2017-12-09 Thread Richard Wordingham via Unicode
On Sat, 9 Dec 2017 16:08:22 +0100 Philippe Verdy wrote: > 2017-12-09 15:28 GMT+01:00 Richard Wordingham via Unicode < > unicode@unicode.org>: > > > Draft 1 of UAX#29 'Unicode Text Segmentation' for Unicode 11.0.0 > > implies that it might be considered des

Aquaφοβία

2017-12-09 Thread Richard Wordingham via Unicode
Draft 1 of UAX#29 'Unicode Text Segmentation' for Unicode 11.0.0 implies that it might be considered desirable to have a word boundary in 'aquaφοβία' or a grapheme cluster break in a coding such as <006C, U+0901 DEVANAGARI SIGN CANDRABINDU> for el candrabindu (l̐), which should be <006C, U+0310 COM

Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-08 Thread Richard Wordingham via Unicode
Apart from the likely but unmandated consequence of making editing Indic text more difficult (possibly contrary to the UK's Equality Act 2010), there is another difficulty that will follow directly from the currently proposed expansion of grapheme clusters (https://www.unicode.org/reports/tr29/prop

Re: Minimal Implementation of Unicode Collation Algorithm

2017-12-04 Thread Richard Wordingham via Unicode
On Mon, 4 Dec 2017 12:48:11 -0800 Markus Scherer via Unicode wrote: > On Mon, Dec 4, 2017 at 5:30 AM, Richard Wordingham via Unicode < > unicode@unicode.org> wrote: > > Would an implementation that supported no characters be compliant? > I guess so. I assume that wo

Minimal Implementation of Unicode Collation Algorithm

2017-12-04 Thread Richard Wordingham via Unicode
May a collation algorithm that always compares all strings as equal be a compliant implementation of the Unicode Collation Algorithm (UTS #10)? If not, by which clause is it not compliant? Formally, this algorithm would require that all weights be zero. Would an implementation that supported no c

Re: ASCII v Unicode

2017-11-12 Thread Richard Wordingham via Unicode
On Fri, 3 Nov 2017 02:36:43 -0700 Asmus Freytag via Unicode wrote: > On 11/3/2017 2:13 AM, Andre Schappo via Unicode wrote: > > You may > find https://twitter.com/andreschappo/status/926163719331176450 amusing > 😀 > > André Schappo > > You're wildly off in your page count. > > The "book" part

Re: Normalise Tai Tham or not?

2017-10-11 Thread Richard Wordingham via Unicode
On Wed, 11 Oct 2017 13:10:26 +0300 Eli Zaretskii via Unicode wrote: > > Date: Tue, 10 Oct 2017 21:51:55 +0100 > > From: Richard Wordingham via Unicode > > > > > Emacs lately introduced character-folding in searches, but it's > > > turned off by defa

Re: Normalise Tai Tham or not?

2017-10-10 Thread Richard Wordingham via Unicode
On Tue, 10 Oct 2017 22:46:20 +0300 Eli Zaretskii via Unicode wrote: > > Date: Tue, 10 Oct 2017 20:00:12 +0100 > > From: Richard Wordingham via Unicode > > > > 4) The pressure on search tools to respect canonical equivalence is > > now relatively low. Some editors

Normalise Tai Tham or not?

2017-10-10 Thread Richard Wordingham via Unicode
I'm preparing to share a spell-checker for Northern Thai in the Tai Tham script, and I'm having difficulty deciding whether to offer corrections in NFC/NFD or unnormalised. The problem arises in closed syllables with tone marks. For example, ᨠᩥ᩠᩵ᨶ /kin/ 'smell', has two canonically equivalent enc

Western numeral diacritics in complex scripts

2017-09-17 Thread Richard Wordingham via Unicode
In philological work, one encounters the problem that two or more abstract characters have only same 'natural' transliteration; the same problem can apply to reconstructed phonemes, where there is no sound indication of the actual pronunciation. A common solution is to use a subscript or superscrip

Re: Character Sequences of Uncertain Rendering (was: Version linking?)

2017-08-27 Thread Richard Wordingham via Unicode
On Sun, 27 Aug 2017 19:55:31 +0200 Philippe Verdy via Unicode wrote: > 2017-08-27 6:06 GMT+02:00 Richard Wordingham via Unicode < > unicode@unicode.org>: > Canonical reordering is unambiguously refering to the canonical > equivalences in TUS. These are automated and can

Re: Unicode education in Schools

2017-08-26 Thread Richard Wordingham via Unicode
On Fri, 25 Aug 2017 09:36:44 -0400 John W Kennedy wrote: > Just a reminder that in Apple’s Swift a “Character” is anything that > looks like a character, including a letter with any theoretically > unlimited stack of diacritics, a flag, or a skin-toned emoji, and all > Swift functions working wit

Re: Character Sequences of Uncertain Rendering (was: Version linking?)

2017-08-26 Thread Richard Wordingham via Unicode
On Sat, 26 Aug 2017 21:52:19 +0200 Philippe Verdy via Unicode wrote: > 2017-08-26 21:28 GMT+02:00 Richard Wordingham via Unicode < > unicode@unicode.org>: > Of course SHY in this use is not suitable, but who knows if one will > not need this to split in tow parts what wo

Re: Unicode education in Schools

2017-08-26 Thread Richard Wordingham via Unicode
On Sat, 26 Aug 2017 21:20:45 +0300 Eli Zaretskii via Unicode wrote: > > Date: Sat, 26 Aug 2017 18:52:03 +0100 > > From: Richard Wordingham via Unicode > We are miscommunicating. My point was that programming for MS-Windows > needs a good understanding of what the UTF-16 surr

Character Sequences of Uncertain Rendering (was: Version linking?)

2017-08-26 Thread Richard Wordingham via Unicode
On Fri, 25 Aug 2017 01:24:36 +0200 Philippe Verdy via Unicode wrote: > 2017-08-17 22:37 GMT+02:00 Richard Wordingham via Unicode < > unicode@unicode.org>: > > > Fortunately, there is no good evidence that the occurrence > > of multiple distinct left matras is

Re: Unicode education in Schools

2017-08-26 Thread Richard Wordingham via Unicode
On Sat, 26 Aug 2017 18:55:25 +0300 Eli Zaretskii via Unicode wrote: > > Date: Sat, 26 Aug 2017 16:09:33 +0100 > > From: Richard Wordingham via Unicode > > It shouldn't. UTF-16 works just like UTF-8, except that the code > > units are bigger. > Not

Re: Unicode education in Schools

2017-08-26 Thread Richard Wordingham via Unicode
On Fri, 25 Aug 2017 09:36:00 +0300 Eli Zaretskii via Unicode wrote: > > Date: Fri, 25 Aug 2017 00:23:40 +0100 > > From: Richard Wordingham via Unicode > > > > On Thu, 24 Aug 2017 17:17:10 + > > Andre Schappo via Unicode wrote: > > > > &g

Re: Unicode education in Schools

2017-08-26 Thread Richard Wordingham via Unicode
On Fri, 25 Aug 2017 12:57:37 +0100 (BST) William_J_G Overington via Unicode wrote: > UTF-16 is very useful. I use it in my research project. > If the byte content of a UTF-16 file is displayed in a hexadecimal > display then for all plane 0 characters the byte content of the > character codes ar

Re: Unicode education in Schools

2017-08-24 Thread Richard Wordingham via Unicode
On Thu, 24 Aug 2017 17:17:10 + Andre Schappo via Unicode wrote: > So, I consider it important to familiarise students with SMP > characters as well as BMP characters. Then when they develop software > they will, at the start, be thinking beyond ASCII and Unicode BMP > characters. Just steer

Re: Version linking?

2017-08-17 Thread Richard Wordingham via Unicode
On Thu, 17 Aug 2017 18:34:56 +0530 Shriramana Sharma via Unicode wrote: > Thanks for your reply, but how can characters be used portably if they > are not part of the published standard yet? Or is it that hereafter > both Unicode Standard + Unicode Emoji Standard will be parallelly > portable or

Re: Turtle Graphics Emoji

2017-07-29 Thread Richard Wordingham via Unicode
On Fri, 28 Jul 2017 13:22:22 +0100 (BST) William_J_G Overington via Unicode wrote: > I have been thinking about having Turtle Graphics Emoji as an > educational and fun idea. I trust you are aware of the widespread feeling that there is already an excessive number of turtle characters in Unicode

Re: Unicode education in UK Schools

2017-07-08 Thread Richard Wordingham via Unicode
On Sat, 8 Jul 2017 09:04:39 -0700 Asmus Freytag via Unicode wrote: > But some handling > of combining mark (and also the new emoji sequences) would equally > constitute "basic" knowledge, with the Unicode algorithms like > sorting, Which major applications actually use the Unicode Collation Algo

Re: LATIN CAPITAL LETTER SHARP S officially recognized

2017-07-02 Thread Richard Wordingham via Unicode
On Sat, 01 Jul 2017 09:51:00 +0300 "a.lukyanov via Unicode" wrote: > Is it possible to design fonts that will render ẞ as SS? > > So we could choose between ẞ and SS by just selecting the proper > font, without changing the text itself. > > Or perhaps there will be a "font feature" to select th

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-05 Thread Richard Wordingham via Unicode
On Mon, 5 Jun 2017 13:08:06 +0900 "Martin J. Dürst via Unicode" wrote: > On 2017/06/02 04:54, Doug Ewell via Unicode wrote: > > Richard Wordingham wrote: > > > >> even supporting 6-byte patterns just in case 20.1 bits eventually > >> turn out not to be enough, > > Sorry to be late with this

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Richard Wordingham via Unicode
On Thu, 1 Jun 2017 19:19:51 -0700 Ken Whistler via Unicode wrote: > > and therefore should start a > > sequence of 6 characters. > > That is completely false, and has nothing to do with the current > definition of UTF-8. > > The current, normative definition of UTF-8, in the Unicode Standa

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Richard Wordingham via Unicode
On Thu, 1 Jun 2017 17:10:54 -0700 Ken Whistler via Unicode wrote: > Well, working from the *current* specification: > > FC 80 80 80 80 80 > and > FF FF FF FF FF FF > > are equal trash, uninterpretable as *anything* in UTF-8. > > By definition D39b, either sequence of bytes, if encountered by a

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Richard Wordingham via Unicode
On Thu, 1 Jun 2017 17:10:54 -0700 Ken Whistler via Unicode wrote: > On 6/1/2017 2:39 PM, Richard Wordingham via Unicode wrote: > > You were implicitly invited to argue that there was no need to > > handle 5 and 6 byte invalid sequences. > > > > Well, working from

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Richard Wordingham via Unicode
On Thu, 01 Jun 2017 12:54:45 -0700 Doug Ewell via Unicode wrote: > Richard Wordingham wrote: > > > even supporting 6-byte patterns just in case 20.1 bits eventually > > turn out not to be enough, > > Oh, gosh, here we go with this. You were implicitly invited to argue that there was no need

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-06-01 Thread Richard Wordingham via Unicode
On Thu, 1 Jun 2017 12:32:08 +0300 Henri Sivonen via Unicode wrote: > On Wed, May 31, 2017 at 8:11 PM, Richard Wordingham via Unicode > wrote: > > On Wed, 31 May 2017 15:12:12 +0300 > > Henri Sivonen via Unicode wrote: > >> I am not claiming it's too di

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-31 Thread Richard Wordingham via Unicode
On Wed, 31 May 2017 19:24:04 + Shawn Steele via Unicode wrote: > It seems to me that being able to use a data stream of ambiguous > quality in another application with predictable results, then that > stream should be “repaired” prior to being handed over. Then both > endpoints would be usin

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-31 Thread Richard Wordingham via Unicode
On Wed, 31 May 2017 17:43:08 + Shawn Steele via Unicode wrote: > There also appears to be a special weight given to > non-minimally-encoded sequences. It would seem to me that none of > these illegal sequences should appear in practice, so we have either: > I do not understand the energy

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-31 Thread Richard Wordingham via Unicode
On Wed, 31 May 2017 15:12:12 +0300 Henri Sivonen via Unicode wrote: > The write-up mentions > https://bugs.chromium.org/p/chromium/issues/detail?id=662822#c13 . I'd > like to draw everyone's attention to that bug, which is real-world > evidence of a bug arising from two UTF-8 decoders within one

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-30 Thread Richard Wordingham via Unicode
On Fri, 26 May 2017 21:41:49 + Shawn Steele via Unicode wrote: > I totally get the forward/backward scanning in sync without decoding > reasoning for some implementations, however I do not think that the > practices that benefit those should extend to other applications that > are happy with

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-30 Thread Richard Wordingham via Unicode
On Tue, 30 May 2017 16:38:45 -0600 Karl Williamson via Unicode wrote: > Under Best Practices, how many REPLACEMENT CHARACTERs should the > sequence generate? 0, 1, 2, 3, 4 ? > > In practice, how many do parsers generate? See Markus Kuhn's test page http://www.cl.cam.ac.uk/~mgk25/ucs/examples

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-30 Thread Richard Wordingham via Unicode
On Fri, 26 May 2017 11:22:37 -0700 Ken Whistler via Unicode wrote: > On 5/26/2017 10:28 AM, Karl Williamson via Unicode wrote: > > The link provided about the PRI doesn't lead to the comments. > > > > PRI #121 (August, 2008) pre-dated the practice of keeping all the > feedback comments togeth

Re: Comparing Raw Values of the Age Property

2017-05-23 Thread Richard Wordingham via Unicode
On Tue, 23 May 2017 17:44:49 -0700 Ken Whistler via Unicode wrote: > Ah, but keep in mind, if projecting out to Version 23.0 (in the year > 2030, by our current schedule), there is a significant chance that > particular UCD data files may have morphed into something entirely > different. Reca

Re: Comparing Raw Values of the Age Property

2017-05-23 Thread Richard Wordingham via Unicode
On Tue, 23 May 2017 05:29:33 -0700 Asmus Freytag via Unicode wrote: > On 5/23/2017 4:04 AM, Janusz S. Bien via Unicode wrote: > > Quote/Cytat - Manuel Strehl via Unicode (Tue > > 23 May 2017 11:33:24 AM CEST): > > > >> The rising standard in the world of web development (and others) > >> is ca

Re: Comparing Raw Values of the Age Property

2017-05-22 Thread Richard Wordingham via Unicode
On Mon, 22 May 2017 17:19:08 -0500 Anshuman Pandey wrote: > I performed several operations on DerivedAge.txt a few months ago. > One basic example here: > > https://pandey.github.io/posts/unicode-growth-UCD-python.html So what happens if you apply it to Unicode Version 10.0? Are the versions s

Re: Comparing Raw Values of the Age Property

2017-05-22 Thread Richard Wordingham via Unicode
On Mon, 22 May 2017 15:10:02 -0700 Markus Scherer via Unicode wrote: > On Mon, May 22, 2017 at 2:44 PM, Richard Wordingham via Unicode < > unicode@unicode.org> wrote: > > > Given two raw values of the Age property, defined in UCD file > > DerivedAge.txt, how is a c

Comparing Raw Values of the Age Property

2017-05-22 Thread Richard Wordingham via Unicode
Given two raw values of the Age property, defined in UCD file DerivedAge.txt, how is a computer program supposed to compare them? Apart from special handling for the value "Unassigned" and its short alias "NA", one used to be able to compare short values against short values and long values against

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-18 Thread Richard Wordingham via Unicode
On Thu, 18 May 2017 09:58:43 +0100 Alastair Houghton via Unicode wrote: > On 18 May 2017, at 07:18, Henri Sivonen via Unicode > wrote: > > > > the decision complicates U+FFFD generation when validating UTF-8 by > > state machine. > > It *really* doesn’t. Even if you’re hell bent on using a

<    1   2   3   4   5   >