Re: UCA unnecessary collation weight 0000

2018-11-01 Thread Richard Wordingham via Unicode
On Thu, 1 Nov 2018 22:04:40 +0100 Philippe Verdy via Unicode wrote: > The DUCET could have as well used the notation ".none", or > just dropped every "." in its file (provided it contains a data > entry specifying what is the minimum weight used for each level). > This notation is only

Re: A sign/abbreviation for "magister"

2018-11-01 Thread Richard Wordingham via Unicode
On Wed, 31 Oct 2018 11:35:19 -0700 Asmus Freytag via Unicode wrote: > On the other hand, I'm a firm believer in applying certain styling > attributes to things like e-mail or discussion papers. Well-placed > emphasis can make such texts more readable (without requiring that > they pay attention

Re: use vs mention (was: second attempt)

2018-11-01 Thread Richard Wordingham via Unicode
On Wed, 31 Oct 2018 23:35:06 +0100 Piotr Karocki via Unicode wrote: > These are only examples of changes in meaning with or , > not all of these examples can really exist - but, then, another > question: can we know what author means? And as carbon and iodine > cannot exist, then of course CI

Re: A sign/abbreviation for "magister"

2018-11-01 Thread Richard Wordingham via Unicode
On Wed, 31 Oct 2018 14:57:37 -0700 Asmus Freytag via Unicode wrote: > On 10/31/2018 10:18 AM, Marcel Schneider via Unicode wrote: >> Sad that Arabic ² and ³ are still missing. > How about all the other sets of native digits? They might not be in natural use this way! Also, there is the

Logical Order (was: A sign/abbreviation for "magister")

2018-10-30 Thread Richard Wordingham via Unicode
On Tue, 30 Oct 2018 02:47:25 +0100 Philippe Verdy via Unicode wrote: > We are here at the line between what is pure visual encoding (e.g. > using superscript letters), and logical encoding (as done eveywhere > else in unicode with combining sequences; the most well known > exceptions being for

Re: A sign/abbreviation for "magister"

2018-10-30 Thread Richard Wordingham via Unicode
On Tue, 30 Oct 2018 11:43:14 + James Kass via Unicode wrote: > Now what if we were future historians given the task of encoding both > of those strings, from two different sources, and had no idea what > those two strings were supposed to represent?  Wouldn't it be best to > preserve both

Re: A sign/abbreviation for "magister"

2018-10-30 Thread Richard Wordingham via Unicode
On Mon, 29 Oct 2018 12:20:49 -0700 Doug Ewell via Unicode wrote: > Richard Wordingham wrote: > > I think this is one of the few cases where Multicode may have > > advantages over Unicode. In a mathematical contest, aⁿ would be > > interpreted as _a_ applied _n_ times. As to "fⁿ", ambiguity may

Re: A sign/abbreviation for "magister"

2018-10-29 Thread Richard Wordingham via Unicode
On Sun, 28 Oct 2018 20:42:04 + Michael Everson via Unicode wrote: > I like palaeographic renderings of text very much indeed, and in fact > remain in conflict with members of the UTC (who still, alas, do NOT > communicate directly about such matters, but only in duelling ballot > comments)

Re: A sign/abbreviation for "magister"

2018-10-28 Thread Richard Wordingham via Unicode
On Sat, 27 Oct 2018 05:58:38 -0700 Asmus Freytag via Unicode wrote: > On 10/27/2018 4:10 AM, Janusz S. Bień via Unicode wrote: >> you can see 2 occurences of a symbol which is explicitely explained >> (in Polish) as meaning "Magister". >> First question is: how do you interpret the symbol?

Re: Fallback for Sinhala Consonant Clusters

2018-10-16 Thread Richard Wordingham via Unicode
On Tue, 16 Oct 2018 22:00:18 +1100 Harshula via Unicode wrote: > When a font is missing a glyph during an *explicit* conjunct lookup, > it appears the most accurate solution is to display the missing glyph > symbol. However, I don't believe that that is the most useful solution, and it

Re: Fallback for Sinhala Consonant Clusters

2018-10-15 Thread Richard Wordingham via Unicode
On Tue, 16 Oct 2018 11:59:54 +1100 Harshula via Unicode wrote: > Hi Richard, > > On 16/10/18 6:57 am, Richard Wordingham via Unicode wrote: > > On Tue, 16 Oct 2018 02:47:36 +1100 > > Harshula via Unicode wrote: > > > >> Note, touching letters

Re: Fallback for Sinhala Consonant Clusters

2018-10-15 Thread Richard Wordingham via Unicode
On Tue, 16 Oct 2018 02:47:36 +1100 Harshula via Unicode wrote: > Note, touching letters are formed by , so they should > not be displayed as a fallback for conjuncts. I don't follow that. While the conjuncts with r-, -r and -y are very different to pairs of touching letters, the conjuncts for

Re: Fallback for Sinhala Consonant Clusters

2018-10-15 Thread Richard Wordingham via Unicode
On Mon, 15 Oct 2018 01:55:24 +1100 Harshula via Unicode wrote: > 3) However, what you have observed is an issue with *explicit* > conjunct creation. After the segmentation is completed, the > layout/shaping engine needs to first check if there is a > corresponding lookup for the explicit

Re: Fallback for Sinhala Consonant Clusters

2018-10-14 Thread Richard Wordingham via Unicode
On Sun, 14 Oct 2018 17:15:26 +0900 "Martin J. Dürst via Unicode" wrote: > Hello Richard, > > On 2018/10/14 09:02, Richard Wordingham via Unicode wrote: > > Are there fallback rules for Sinhala consonant clusters? There are > > fallback rules for Devanagari, but I

Fallback for Sinhala Consonant Clusters

2018-10-13 Thread Richard Wordingham via Unicode
Are there fallback rules for Sinhala consonant clusters? There are fallback rules for Devanagari, but I'm not sure if they read across. The problem I am seeing is that the Pali syllable 'ndhe' න්‍ධෙ is being rendered identically to a hypothetical Sinhalese 'nēdha' නේධ , which in NFD is , when

Re: Tamil Brahmi Short Mid Vowels

2018-09-11 Thread Richard Wordingham via Unicode
On Wed, 29 Aug 2018 21:42:57 + Andrew Glass via Unicode wrote: > Thank you Richard and Shriramana for bringing up this interesting > problem. > > I agree we need to fix this. I don’t want to fix this with a font > hack or change to USE cluster rules or properties. I think the right > place

Re: Unicode String Models

2018-09-11 Thread Richard Wordingham via Unicode
On Tue, 11 Sep 2018 21:10:03 +0200 Hans Åberg via Unicode wrote: > Indeed, before UTF-8, in the 1990s, I recall some Russians using > LaTeX files with sections in different Cyrillic and Latin encodings, > changing the editor encoding while typing. Rather like some of the old Unicode list

Re: Unicode String Models

2018-09-09 Thread Richard Wordingham via Unicode
On Sat, 8 Sep 2018 18:36:00 +0200 Mark Davis ☕️ via Unicode wrote: > I recently did some extensive revisions of a paper on Unicode string > models (APIs). Comments are welcome. > > https://docs.google.com/document/d/1wuzzMOvKOJw93SWZAqoim1VUl9mloUxE0W6Ki_G23tw/edit# Theoretically at least,

Re: UCD in XML or in CSV?

2018-09-01 Thread Richard Wordingham via Unicode
On Fri, 31 Aug 2018 10:36:45 +0200 Manuel Strehl via Unicode wrote: > For me it's currently much easier to have all the data in a single > place, e.g. a large XML file, than spread over a multitude of files > _with different ad-hoc syntaxes_. > > The situation would possibly be different,

Emacs Verbose Character Entry (was Private Use Areas)

2018-08-23 Thread Richard Wordingham via Unicode
On Thu, 23 Aug 2018 21:47:03 +0200 "Janusz S. Bień via Unicode" wrote: > My needs are very simple, for example C-x 8 Return LATIN CAPITAL > LETTER A WITH MACRON AND BREVE [MUFI] should yield the character with > the code E010. I can provide the list of names and codes. While it should obviously

Re: Private Use areas

2018-08-23 Thread Richard Wordingham via Unicode
On Thu, 23 Aug 2018 20:34:20 +0200 "Janusz S. Bień via Unicode" wrote: > This is a typical but IMHO obsolete perspective. Fonts are for > *rendering*, new characters and variants are more and more often > needed for *input* of real life old texts with sufficient precision. If we're talking

Re: Private Use areas

2018-08-23 Thread Richard Wordingham via Unicode
On Thu, 23 Aug 2018 17:39:15 +0200 Philippe Verdy via Unicode wrote: > You make a confusion: I do not propose "hacking" existing codes, but > instead adding new codes for private variations. It's then up to PUV > sequence authors to choose an appropropriate base character that can > have the

Re: Private Use areas

2018-08-23 Thread Richard Wordingham via Unicode
On Thu, 23 Aug 2018 14:10:35 +0200 "Janusz S. Bień via Unicode" wrote: > What kind of software do you have in mind? > > I'm primarily interested in the locally developed programs > > https://bitbucket.org/jsbien/unihistext/ > > https://bitbucket.org/jsbien/fntsample-fork-with-ucd-comments/

Re: Private Use areas

2018-08-23 Thread Richard Wordingham via Unicode
On Wed, 22 Aug 2018 11:58:58 +0200 Philippe Verdy via Unicode wrote: > For now there's still no way to have variant sequences unless they are > registered and standardized by Unicode but registration should be not > needed (forbidden) for sequences containing PUV. I believe this scheme is no

Re: Private Use areas

2018-08-21 Thread Richard Wordingham via Unicode
On Tue, 21 Aug 2018 11:03:41 -0700 Ken Whistler via Unicode wrote: > On 8/21/2018 7:56 AM, Adam Borowski via Unicode wrote: > Really? Suppose someone wants to implement a bicameral script in PUA. > They would need case mappings for that, and how would those be > "better represented in the font

Re: Private Use areas (was: Re: Thoughts on working with the Emoji Subcommittee (was ...))

2018-08-21 Thread Richard Wordingham via Unicode
On Tue, 21 Aug 2018 08:53:18 +0800 via Unicode wrote: > On 2018-08-21 08:04, Mark E. Shoulson via Unicode wrote: > > Still, maybe it > > doesn't really matter much: your special-purpose font can treat any > > codepoint any way it likes, right? > Not all properties come from the font. For

Re: Unicode 11 Georgian uppercase vs. fonts

2018-07-31 Thread Richard Wordingham via Unicode
On Mon, 30 Jul 2018 19:49:57 -0400 "Mark E. Shoulson via Unicode" wrote: > O blessed gods of writing, you mean yet *another* script wants > (wanted?) to commit the mistake of bicamerality?  Just quit while > you're ahead! WWS describes Javanese as having 'capital letters'. On closer

Re: Unicode 11 Georgian uppercase vs. fonts

2018-07-28 Thread Richard Wordingham via Unicode
On Sat, 28 Jul 2018 19:01:03 +0200 Kent Karlsson via Unicode wrote: > The (proper) case-mapping for ẞ is nowhere to be > found the Unicode database (which I think is a pity, but that is a > different matter). Actually it is. It is the case-mapping of ß which was disputed. However, unless I've

Re: Unicode 11 Georgian uppercase vs. fonts

2018-07-28 Thread Richard Wordingham via Unicode
On Sat, 28 Jul 2018 01:45:53 + Peter Constable via Unicode wrote: > (iii) gave > indication of intent to develop a plan of action for preparing their > institutions for this change as well as communicating that within > Georgian industry and society. It was only after that did UTC feel it >

Re: Unicode 11 Georgian uppercase vs. fonts

2018-07-27 Thread Richard Wordingham via Unicode
On Fri, 27 Jul 2018 07:00:31 -0700 Asmus Freytag via Unicode wrote: > To get back to Markus' original question on how to handle this for > ICU: it seems more and more that Georgian should be exempted from > standard library functions and that a new function needs to be added > that just

Re: Unicode 11 Georgian uppercase vs. fonts

2018-07-26 Thread Richard Wordingham via Unicode
On Thu, 26 Jul 2018 23:27:08 +0400 Alexey Ostrovsky via Unicode wrote: > Before answering, we must mention the caseless nature of the Georgian > script. It "capital" letters do not exists as letters, they are letter > variants used exactly the same way as the Latin title case. Therefore, >

Re: Tamil Brahmi Short Mid Vowels

2018-07-21 Thread Richard Wordingham via Unicode
On Sat, 21 Jul 2018 07:55:51 +0530 Shriramana Sharma via Unicode wrote: > This is a unique problem because this is probably the only case where > the same script produces conjuncts for one language and not for > another. There are and have been similar cases. Reformed (a.k.a. 'typewriter')

Tamil Brahmi Short Mid Vowels

2018-07-20 Thread Richard Wordingham via Unicode
A problem has been spotted with the rendering of Tamil Brahmi vowels - in particular the sequence does not conform to the grammar of the Universal Shaping Engine (USE); a dotted circle may be inserted between the vowel and the pulli. When considering font-level remedies, I realised that there

Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-18 Thread Richard Wordingham via Unicode
On Wed, 18 Jul 2018 13:43:36 + (UTC) philip chastney via Unicode wrote: > > On Tue, 17/7/18, Richard Wordingham via Unicode > wrote: > > > Subject: Re: UAX #9: applicability of higher-level protocols to > > bidi plaintext

Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-16 Thread Richard Wordingham via Unicode
On Mon, 16 Jul 2018 10:53:03 +0300 Shai Berger via Unicode wrote: > What I'm not OK with is: > > !Hello, World > > Which is what you'll see if your editor decides to use RTL > directionality for this file, as the FAQ says it may. Using 'left aligned' for RTL and 'right aligned' for LTR are

Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-15 Thread Richard Wordingham via Unicode
On Sat, 14 Jul 2018 12:14:35 -0700 Asmus Freytag via Unicode wrote: > The bidi case is just another such case where you cannot expect any > fidelity in presentation whatsoever. (And certainly not in the case > of degenerate files containing all but one weak character). It's going a bit far to

Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-14 Thread Richard Wordingham via Unicode
On Sat, 14 Jul 2018 13:09:11 +0300 Shai Berger via Unicode wrote: > On Fri, 13 Jul 2018 11:22:51 +0300 > Eli Zaretskii via Unicode wrote: > > > > > Different applications will have different needs here, so there's > > definitely a need to provide applications and users with some > > control

Re: The Unicode Standard and ISO

2018-06-09 Thread Richard Wordingham via Unicode
On Sat, 9 Jun 2018 08:23:33 +0200 (CEST) Marcel Schneider via Unicode wrote: > > Where there is opportunity for productive sync and merging with is > > glibc. We have had some discussions, but more needs to be done- > > especially a lot of tooling work. Currently many bug reports are > >

Re: The Unicode Standard and ISO

2018-06-08 Thread Richard Wordingham via Unicode
On Fri, 8 Jun 2018 20:45:26 +0200 Philippe Verdy via Unicode wrote: > 2018-06-08 19:41 GMT+02:00 Richard Wordingham via Unicode < > unicode@unicode.org>: > The way tailoring is designed in CLDR using only data used by a > generic algorithm, and not custom algorithm i

Re: The Unicode Standard and ISO

2018-06-08 Thread Richard Wordingham via Unicode
n, which is a formal problem for U+FDD0. I would expect you to argue that it is more useful for U+FDD0 to have the special behaviour defined in CLDR, and restrict conformance with DUCET to characters other than non-characters. > On Fri, Jun 8, 2018 at 10:41 AM, Richard Wordingham via Unicode < >

Re: The Unicode Standard and ISO

2018-06-08 Thread Richard Wordingham via Unicode
On Fri, 8 Jun 2018 13:40:21 +0200 Mark Davis ☕️ wrote: > Mark > > On Fri, Jun 8, 2018 at 10:06 AM, Richard Wordingham via Unicode < > unicode@unicode.org> wrote: > > > On Fri, 8 Jun 2018 05:32:51 +0200 (CEST) > > Marcel Schneider via Unicode wrote: > >

Re: The Unicode Standard and ISO

2018-06-08 Thread Richard Wordingham via Unicode
On Fri, 8 Jun 2018 05:32:51 +0200 (CEST) Marcel Schneider via Unicode wrote: > Thank you for confirming. All witnesses concur to invalidate the > statement about uniqueness of ISO/IEC 10646 ‐ Unicode synchrony. — > After being invented in its actual form, sorting was standardized >

Re: Hyphenation Markup

2018-06-07 Thread Richard Wordingham via Unicode
On Sat, 2 Jun 2018 05:44:29 +0100 Richard Wordingham via Unicode wrote: > In Latin text, one can indicate permissible line break opportunities > between grapheme clusters by inserting U+00AD SOFT HYPHEN. What > low-end schemes, if any, exist for such mark-up within grapheme &

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Richard Wordingham via Unicode
On Thu, 7 Jun 2018 10:42:46 +0200 Mark Davis ☕️ via Unicode wrote: > > The proposal also asks for identifiers to be treated as equivalent > > under > NFKC. > > The guidance in #31 may not be clear. It is not to replace > identifiers as typed in by the user by their NFKC equivalent. It is >

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Richard Wordingham via Unicode
On Thu, 7 Jun 2018 13:32:13 +0200 Joan Montané via Unicode wrote: > 2018-06-04 21:49 GMT+02:00 Manish Goregaokar via Unicode < > unicode@unicode.org>: > * Ŀ, LATIN CAPITAL LETTER L WITH MIDDEL DOT NFKC decomposes > to LATIN CAPITAL LETTER L (U+004C) MIDDLE DOT (U+00B7): > * ŀ, LATIN SMALL

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Richard Wordingham via Unicode
On Tue, 5 Jun 2018 01:37:47 +0100 Richard Wordingham via Unicode wrote: > The decomposed > form that looks the same is นํ้า . > The problem is that for sane results, needs > special handling. This sequence is also often untypable - part of the > protection against Thai homogra

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-06 Thread Richard Wordingham via Unicode
On Mon, 4 Jun 2018 12:49:20 -0700 Manish Goregaokar via Unicode wrote: > Hi, > > The Rust community is considering > adding non-ascii > identifiers, which follow UAX #31 > (XID_Start XID_Continue*, with >

Re: Requiring typed text to be NFKC

2018-06-06 Thread Richard Wordingham via Unicode
On Tue, 5 Jun 2018 19:48:53 -0700 Manish Goregaokar via Unicode wrote: > Following up from my previous email > , > one of the ideas that was brought up was that if we're going to > consider NFKC forms equivalent, we should require

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-04 Thread Richard Wordingham via Unicode
On Mon, 4 Jun 2018 12:49:20 -0700 Manish Goregaokar via Unicode wrote: > Hi, > > The Rust community is considering > adding non-ascii > identifiers, which follow UAX #31 > (XID_Start XID_Continue*, with >

Re: Hyphenation Markup

2018-06-03 Thread Richard Wordingham via Unicode
On Sun, 3 Jun 2018 04:31:32 +0100 Richard Wordingham via Unicode wrote: > However, the text is actually in the Tham script, and without any > line-breaking controls, the first and third examples read, marking the > grapheme cluster boundaries with '|', as ᨾ᩠ᨿᩮ MA, U+1A60 TAI THAM SIGN

Re: Hyphenation Markup

2018-06-02 Thread Richard Wordingham via Unicode
On Sat, 2 Jun 2018 14:33:01 -0600 Doug Ewell via Unicode wrote: > Richard Wordingham wrote: > > >> What about U+200B ZWSP? > > > > Thanks for the suggestion, but it's not likely to work: > > Are you asking what schemes exist, or are you trying to call > attention to some rendering engine

Re: Hyphenation Markup

2018-06-02 Thread Richard Wordingham via Unicode
On Sat, 2 Jun 2018 11:06:43 +0200 Otto Stolz via Unicode wrote: > Am 2018-06-02 um 06:44 schrieb Richard Wordingham via Unicode: > > In Latin text, one can indicate permissible line break opportunities > > between grapheme clusters by inserting U+00AD SOFT HYPHEN. What >

Hyphenation Markup

2018-06-01 Thread Richard Wordingham via Unicode
In Latin text, one can indicate permissible line break opportunities between grapheme clusters by inserting U+00AD SOFT HYPHEN. What low-end schemes, if any, exist for such mark-up within grapheme clusters? The visual effect I wish to enable can be presented simply as: line-break

Character Boundaries - Who is to choose?

2018-05-31 Thread Richard Wordingham via Unicode
This has nothing to do with grapheme boundaries. A few days ago, I remarked that deciding whether two character usages were of the same character was akin to deciding whether two populations were of the same species. It can also be difficult to decide where the boundary between two species lies.

Re: Why is TAMIL SIGN VIRAMA (pulli) not Alphabetic?

2018-05-29 Thread Richard Wordingham via Unicode
On Tue, 29 May 2018 14:03:25 -0700 Doug Ewell via Unicode wrote: > In any case, Ken has answered the real underlying question: a process > that checks whether each character in a sequence is "alphabetic" is > inappropriate for determining whether the sequence constitutes a word. Back in the

Re: Why is TAMIL SIGN VIRAMA (pulli) not Alphabetic?

2018-05-29 Thread Richard Wordingham via Unicode
On Tue, 29 May 2018 07:27:21 -0700 Ken Whistler via Unicode wrote: > On 5/29/2018 12:49 AM, Richard Wordingham via Unicode wrote: > > How would one know that they are misapplied? And what if the > > author of the text has broken your rules? Are such texts never to > > be

Re: Why is TAMIL SIGN VIRAMA (pulli) not Alphabetic?

2018-05-29 Thread Richard Wordingham via Unicode
On Mon, 28 May 2018 16:13:43 -0600 Doug Ewell via Unicode wrote: > Richard Wordingham wrote: > > > The effects of virama that spring to mind are: > > > > (a) Causing one or both letters on either side to change or combine > > to indicate combination; > > > > (b) Appearing as a mark only if it

Re: Unicode characters unification

2018-05-29 Thread Richard Wordingham via Unicode
On Mon, 28 May 2018 21:40:49 -0700 Asmus Freytag via Unicode wrote: > But such exceptions prove the rule, which leads back to where we > started: the default position is that Unicode encodes a character > identity that is not the same as encoding the concept that said > character is used to

Re: Why is TAMIL SIGN VIRAMA (pulli) not Alphabetic?

2018-05-29 Thread Richard Wordingham via Unicode
On Mon, 28 May 2018 22:02:15 -0700 Ken Whistler via Unicode wrote: > On 5/28/2018 9:44 PM, Asmus Freytag via Unicode wrote: > > One of the general principles is that combining marks inherit the > > property of their base character. > > > > Normally, "inherited" should be the only property value

Re: Unicode characters unification

2018-05-28 Thread Richard Wordingham via Unicode
On Mon, 28 May 2018 21:14:58 +0200 Hans Åberg via Unicode wrote: > > On 28 May 2018, at 21:01, Richard Wordingham via Unicode > > wrote: > > > > On Mon, 28 May 2018 20:19:09 +0200 > > Hans Åberg via Unicode wrote: > > > >> Indistinguishable

Re: Why is TAMIL SIGN VIRAMA (pulli) not Alphabetic?

2018-05-28 Thread Richard Wordingham via Unicode
On Mon, 28 May 2018 20:03:11 +0530 SundaraRaman R via Unicode wrote: > Hi, thanks for your reply. > > > There is only one character with a canonical combining class of 9 > > that is included as other_alphabetic, namely U+0E3A THAI CHARACTER > > PHINTHU. That last had any of

Re: preliminary proposal: New Unicode characters for Arabic music half-flat and half-sharp symbols

2018-05-28 Thread Richard Wordingham via Unicode
On Mon, 28 May 2018 10:08:30 +0200 Hans Åberg via Unicode wrote: > > On 28 May 2018, at 03:39, Garth Wallace wrote: > > The fact that they do not denote the same width in cents in Arabic > > music as they do in Western modern classical does not matter.

Re: Why is TAMIL SIGN VIRAMA (pulli) not Alphabetic?

2018-05-28 Thread Richard Wordingham via Unicode
On Mon, 28 May 2018 00:57:03 +0530 SundaraRaman R via Unicode wrote: > Hi, > > In languages like Ruby or Java > (https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isAlphabetic(int)), > functions to check if a character is alphabetic do that by looking for >

Re: Major vendors changing U+1F52B PISTOL  depiction from firearm to squirt gun

2018-05-23 Thread Richard Wordingham via Unicode
On Wed, 23 May 2018 10:59:02 -0700 Ken Whistler via Unicode wrote: > If you want stable and accurate > conveyance of particular meaning -- well, write it out in the > standard orthography of a particular language. Preferably not of a living language, though even the

Re: Major vendors changing U+1F52B PISTOL  depiction from firearm to squirt gun

2018-05-23 Thread Richard Wordingham via Unicode
On Wed, 23 May 2018 20:08:31 +0300 via Unicode wrote: > I’d treat these as glyph changes within fonts. I'd treat them as gross violations of character identity. Richard.

Re: Extended grapheme cluster stability

2018-05-22 Thread Richard Wordingham via Unicode
On Tue, 22 May 2018 14:43:23 +0200 Martinho Fernandes via Unicode wrote: > On 22.05.18 12:51, Martinho Fernandes via Unicode wrote: > > > Hello, > > > > None of the *_Break properties are stable, as far as I can see in > >

Re: Choosing the Set of Renderable Strings

2018-05-18 Thread Richard Wordingham via Unicode
On Tue, 15 May 2018 04:19:42 -0800 James Kass via Unicode <unicode@unicode.org> wrote: > On Mon, May 14, 2018 at 11:31 AM, Richard Wordingham via Unicode > <unicode@unicode.org> wrote: > > I've seen an implementation of the USE render > > canonically

Re: Choosing the Set of Renderable Strings

2018-05-18 Thread Richard Wordingham via Unicode
On Thu, 17 May 2018 23:38:27 -0800 James Kass via Unicode wrote: > I wrote, > > > Changing the entry order to: > > ᨽᩮᩣᨾᨶᩣᩮ > > > > ... forms the NAA ligature and the vowel re-ordering matches the > > Lamphun graphic you sent. But that kludge probably breaks the > >

Re: Choosing the Set of Renderable Strings

2018-05-18 Thread Richard Wordingham via Unicode
On Thu, 17 May 2018 21:50:38 -0800 James Kass via Unicode wrote: > Richard Wordingham wrote, > > ⇒ Your example appears to be using the font called 'A Tai Tham KH > New'. > > Exactly. The black boxes in the display were becoming tiresome. The > font package is available

Re: L2/18-181

2018-05-17 Thread Richard Wordingham via Unicode
On Wed, 16 May 2018 13:46:22 -0700 Doug Ewell via Unicode wrote: > http://www.unicode.org/L2/L2018/18181-n4947-assamese.pdf > > This is a fascinating proposal to disunify the Assamese script from > Bengali on the following bases: According to the proposal, the encoding for

Re: L2/18-181

2018-05-17 Thread Richard Wordingham via Unicode
On Thu, 17 May 2018 11:43:00 -0700 Doug Ewell via Unicode wrote: > It is the same for Bengali and Assamese, although the > language-specific subsets are called abugidas instead of alphabets. If we allow an abugida to be different to an alphasyllabary, then, in Thailand,

Re: how to make custom combining diacritical marks for arabic letters?

2018-05-17 Thread Richard Wordingham via Unicode
On Thu, 17 May 2018 09:49:55 +0300 dinar qurbanov via Unicode wrote: > how to make custom combining diacritical marks for arabic letters? > should only font drivers and programs support it, or should also > unicode support it, for example, have special area for them? > > as

Re: how to make custom combining diacritical marks for arabic letters?

2018-05-17 Thread Richard Wordingham via Unicode
On Thu, 17 May 2018 08:43:35 -0800 James Kass via Unicode wrote: > This page describes the essentials of OpenType Arabic font > development: > > https://docs.microsoft.com/en-us/typography/script-development/arabic But isn't the problem that PUA diacritics won't reach most

Re: Choosing the Set of Renderable Strings

2018-05-17 Thread Richard Wordingham via Unicode
On Wed, 16 May 2018 22:39:36 +0100 Richard Wordingham via Unicode <unicode@unicode.org> wrote: > As an > example of correct rendering, I include the Pali for 'O mind!', _bho > mano_, encoded , > as rendered by the Lamphun font. Sorry, wrong sequence, wrong font. The correct s

Re: L2/18-181

2018-05-16 Thread Richard Wordingham via Unicode
On Thu, 17 May 2018 01:24:09 +0100 Michael Everson via Unicode wrote: > It sounds to me like a fault in the keyboard software, which could be > fixed by the people who own and maintain that software. We had this discussion a few years ago. See

Re: L2/18-181

2018-05-16 Thread Richard Wordingham via Unicode
On Wed, 16 May 2018 17:41:12 -0500 Anshuman Pandey via Unicode wrote: > > 3. Keyboard design is more difficult because consonants like ক্ষ > > are encoded as conjunct forms instead of atomic characters. > > Ignorant question on my part: is it difficult to use character >

Re: L2/18-181

2018-05-16 Thread Richard Wordingham via Unicode
On Thu, 17 May 2018 00:34:35 +0100 Michael Everson via Unicode <unicode@unicode.org> wrote: > This is not a fault of the encoding. > > > On 16 May 2018, at 23:01, Richard Wordingham via Unicode > > <unicode@unicode.org> wrote: > > > > I think simple

Re: L2/18-181

2018-05-16 Thread Richard Wordingham via Unicode
On Wed, 16 May 2018 13:46:22 -0700 Doug Ewell via Unicode wrote: > http://www.unicode.org/L2/L2018/18181-n4947-assamese.pdf > > This is a fascinating proposal to disunify the Assamese script from > Bengali on the following bases: > 3. Keyboard design is more difficult

Re: Choosing the Set of Renderable Strings

2018-05-16 Thread Richard Wordingham via Unicode
On Wed, 16 May 2018 05:23:08 -0800 James Kass via Unicode wrote: > Note that although the proposal gave canonical combining class > zero to both the tone marks and the vowel signs, the on-line Unicode > data gives canonical combining class 230 to the tone marks. There were

Complete Definition of Each Supported Script

2018-05-15 Thread Richard Wordingham via Unicode
I just found this assertion in https://en.wikipedia.org/wiki/Uniscribe: "Microsoft worked with the Unicode Technical Committee to make shaping requirements available in a machine readable format, so a complete definition of each supported script will be included in the Unicode standard and

Re: Choosing the Set of Renderable Strings

2018-05-15 Thread Richard Wordingham via Unicode
On Tue, 15 May 2018 06:04:45 -0800 James Kass via Unicode wrote: > Display behaviour which is script-specific should be handled by the > rendering/shaping engine. Only that which is font-specific should be > handled by the font. That makes a lot of sense. Unfortunately,

Re: Choosing the Set of Renderable Strings

2018-05-15 Thread Richard Wordingham via Unicode
On Tue, 15 May 2018 04:19:42 -0800 James Kass via Unicode <unicode@unicode.org> wrote: > On Mon, May 14, 2018 at 11:31 AM, Richard Wordingham via Unicode > <unicode@unicode.org> wrote: > > > ... One could argue that the three positions require > > different gl

Re: Choosing the Set of Renderable Strings

2018-05-15 Thread Richard Wordingham via Unicode
On Tue, 15 May 2018 02:18:11 -0800 James Kass via Unicode wrote: > Richard Wordingham replied, > > >> ...Private Use Area... > > > > That's what the Xishuangbanna News does for final consonants. > > I failed to find a link for their web site, but only spent about an > hour

Re: Choosing the Set of Renderable Strings

2018-05-14 Thread Richard Wordingham via Unicode
On Mon, 14 May 2018 04:12:56 -0800 James Kass via Unicode wrote: > In response to William Overington's post, it's easier to transcode > data from a PUA scheme into Unicode than it is to enter the data from > scratch. (The same could be said for a customized ASCII font.)

Re: Choosing the Set of Renderable Strings

2018-05-14 Thread Richard Wordingham via Unicode
On Mon, 14 May 2018 08:55:05 +0100 (BST) William_J_G Overington via Unicode wrote: > One possibility that might be worth consideration is to map each > otherwise unmapped glyph in the font each to a distinct code point in > the Private Use Area. This being as well as all of

Re: Choosing the Set of Renderable Strings

2018-05-14 Thread Richard Wordingham via Unicode
On Sun, 13 May 2018 22:15:10 -0800 James Kass via Unicode wrote: > Richard Wordingham asked, > > > Is this a reasonable approach to allowing both collation > > and suppressing needless homographs? My contribution to > > the rendering is only the provision of a font. > >

Lack of ulUnicodeRange Bit for Adlam

2018-05-12 Thread Richard Wordingham via Unicode
On Tue, 27 Feb 2018 11:45:36 -0500 Neil Patel via Unicode wrote, under topic heading "Re: Unicode Digest, Vol 50, Issue 20": > Does the ulUnicodeRange bits get used to dictate rendering behavior or > script recognition? > > I am just wondering about whether the lack of bits

Choosing the Set of Renderable Strings

2018-05-11 Thread Richard Wordingham via Unicode
For assembling a rendering system for a script with combining marks, is there a guide as to how to decide what strings one should exclude, and which one should strive to support? There will also be characters outside the script that should be supported. For a font, there are lists of characters

Re: RFC 8369 on Internationalizing IPv6 Using 128-Bit Unicode

2018-04-03 Thread Richard Wordingham via Unicode
On Tue, 3 Apr 2018 03:49:51 +0200 Philippe Verdy via Unicode wrote: > It's fun to consider the introdroduction (after emojis) of imojis, > amojis, umojis and omojis for individual people (or named pets), > alien species (E.T. wants to be able to call home with his own >

Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-08 Thread Richard Wordingham via Unicode
On Thu, 08 Mar 2018 09:42:38 +0800 via Unicode wrote: > to the best of my knowledge virtually no new characters used just for > names are under consideration, all the ones that are under > consideration are from before this century. What I was interested in was the rate of

Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)

2018-03-08 Thread Richard Wordingham via Unicode
On Thu, 8 Mar 2018 02:27:06 +0100 (CET) Marcel Schneider via Unicode wrote: > Yes the biggest issue over time, as Ken wrote, is to *maintain* a > translation, be it only the Nameslist. For which accurately determined change bars can work wonders. An alternative would be

Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-07 Thread Richard Wordingham via Unicode
On Mon, 05 Mar 2018 23:42:15 +0800 via Unicode wrote: > In most cases the answer to the above may well be the same, the > unencoded names of people and places are not new names, How many new characters are being devised per year? Richard.

Re: Coloured Characters

2018-02-22 Thread Richard Wordingham via Unicode
On Thu, 22 Feb 2018 10:55:23 + (GMT) William_J_G Overington wrote: > Richard Wordingham wrote: > > > 'Foreground' and 'background' are the only externally defined > > colours. There's no ability to explicitly choose, say 'text stroked > > sable and dotted gules'.

Re: Coloured Characters (was: 0027, 02BC, 2019, or a new character?)

2018-02-21 Thread Richard Wordingham via Unicode
On Thu, 22 Feb 2018 00:04:34 +0100 Philippe Verdy via Unicode wrote: > On the opposite, colored in Arabic or hieroglyph texts is a a useful > emphasize and sometimes semantically significant (some rare old > scripts also used dictinctive colors): we are in a case similar to

Coloured Characters (was: 0027, 02BC, 2019, or a new character?)

2018-02-21 Thread Richard Wordingham via Unicode
On Wed, 21 Feb 2018 16:28:14 +0100 Philippe Verdy via Unicode wrote: > I even hope that there will be a setting in all browsers, OS'es, > mobiles, and apps to refuse any colorful rendering, and just render > them as monochromatic symbols. In summary, COMPLETETY DISABLE the >

Re: metric for block coverage

2018-02-20 Thread Richard Wordingham via Unicode
On Tue, 20 Feb 2018 15:13:16 + "Dreiheller, Albrecht via Unicode" wrote: > Could someone please supply an example (web link ...) for usage of > danda / double danda in Tamil? Thanks, Albrecht Take your pick from http://www.prapatti.com/slokas/slokasbyname.html . Do they

Re: metric for block coverage

2018-02-19 Thread Richard Wordingham via Unicode
On Mon, 19 Feb 2018 20:02:28 +0100 Philippe Verdy via Unicode wrote: > This pair of punctuation should have been considered since long as > common punctuations (independantly of their assigned names), i.e. > assigned the script property "Comn" and not "Deva". I don't see why

Re: Why so much emoji nonsense?

2018-02-18 Thread Richard Wordingham via Unicode
On Sat, 17 Feb 2018 22:31:12 -0800 James Kass via Unicode wrote: > It's true that added features can make for a better presentation. > Removing the special features shouldn't alter the message. I think I've encountered the use of italics in novels for sotto voce or asides.

Re: Unicode of Death 2.0

2018-02-18 Thread Richard Wordingham via Unicode
On Sun, 18 Feb 2018 14:13:22 +0100 Philippe Verdy via Unicode wrote: > But any operation in OpenType that requires reordering requires a > glyphs buffer. This could even apply to Latin if Microsoft really > intends to support normalization (i.e. canonical equivalences) in

Re: metric for block coverage

2018-02-18 Thread Richard Wordingham via Unicode
On Sun, 18 Feb 2018 13:05:29 +0100 Adam Borowski via Unicode wrote: > On Sun, Feb 18, 2018 at 02:14:46AM -0800, James Kass wrote: > > You probably already know that basic script coverage information is > > stored internally in OpenType fonts in the OS/2 table. > > > >

<    1   2   3   4   5   >