Re: Unicode Emoji 5.0 characters now final

2017-03-27 Thread Richard Wordingham
On Mon, 27 Mar 2017 13:34:09 -0700 Ken Whistler wrote: > And if a flag of > California (or Pomerania or ...) then gets added to the list of emoji > tag sequences in a future version of the data, there is a good chance > that the "users" will then see the difference, because

Re: "A Programmer's Introduction to Unicode"

2017-03-14 Thread Richard Wordingham
On Tue, 14 Mar 2017 08:51:18 + Alastair Houghton <alast...@alastairs-place.net> wrote: > On 14 Mar 2017, at 02:03, Richard Wordingham > <richard.wording...@ntlworld.com> wrote: > > > > On Mon, 13 Mar 2017 19:18:00 + > > Alastair Houghton

Re: Proposal to add standardized variation sequences for chess notation

2017-04-03 Thread Richard Wordingham
On Tue, 4 Apr 2017 00:30:30 +0100 Michael Everson wrote: > On 3 Apr 2017, at 23:07, Asmus Freytag (c) > wrote: > You want WHITE CHESS KNIGHT, and WHITE CHESS KNIGHT ON SQUARE, and > use a VS that changes the colour of the square? That is less legible

Re: Proposal to add standardized variation sequences for chess notation

2017-04-03 Thread Richard Wordingham
On Mon, 3 Apr 2017 23:35:52 +0100 Michael Everson <ever...@evertype.com> wrote: > On 3 Apr 2017, at 22:03, Richard Wordingham > <richard.wording...@ntlworld.com> wrote: The relevant text before was, "I'm talking about looking for a U+2654 glyph for ordinary text when a

Re: Combining Class of Thai Nonspacing_Marks

2017-04-04 Thread Richard Wordingham
onally, SARA UE is SARA I plus NIKHAHIT, and I suspect this is the origin of the etymologically odd form of ลึงค์ 'lingam'. > If, as Richard Wordingham wrote: "Unicode combining classes cannot be > changed. All that can be done is to enforce the order of characters > in normal

Re: Combining Class of Thai Nonspacing_Marks

2017-04-04 Thread Richard Wordingham
On Wed, 5 Apr 2017 10:45:43 +0700 "Gerriet M. Denkmann" <gerri...@icloud.com> wrote: > > On 4 Apr 2017, at 23:51,Richard Wordingham > > <richard.wording...@ntlworld.com> wrote: > > The order of MAITAIKHU and tone mark is significant - it should > >

Re: Proposal to add standardized variation sequences for chess notation

2017-04-04 Thread Richard Wordingham
On Wed, 5 Apr 2017 01:02:32 +0100 Michael Everson <ever...@evertype.com> wrote: > On 4 Apr 2017, at 18:54, Richard Wordingham > <richard.wording...@ntlworld.com> wrote: > > > > On Tue, 4 Apr 2017 01:30:05 +0100 > > Michael Everson <ever...@evertype.com>

Re: Proposal to add standardized variation sequences for chess notation

2017-04-05 Thread Richard Wordingham
On Mon, 3 Apr 2017 20:33:55 +0100 Richard Wordingham <richard.wording...@ntlworld.com> wrote: > On Sun, 2 Apr 2017 10:43:39 -0700 > Asmus Freytag <asm...@ix.netcom.com> wrote: > The basic text elements in the scheme other than boundary markers will > be: > > em

Re: Proposal to add standardized variation sequences for chess notation

2017-04-03 Thread Richard Wordingham
On Sun, 2 Apr 2017 10:43:39 -0700 Asmus Freytag wrote: > In these cases, explicit encoding would better cover what is desired: > a reliable way to mark a distinction between different symbols (the > two bishops are separate symbols, that also happen to express > distinct,

Re: Combining Class of Thai Nonspacing_Marks

2017-04-04 Thread Richard Wordingham
On Tue, 4 Apr 2017 09:39:57 +0700 "Gerriet M. Denkmann" wrote: > So the rule should be: > > A consonant may have zero or one tone/other marks and also zero or > one top/bottom vowels. Exceptions: > NIKHAHIT + tone mark (no top/bottom vowel) > MAITAIKHU + tone

Re: Proposal to add standardized variation sequences for chess notation

2017-04-04 Thread Richard Wordingham
On Tue, 4 Apr 2017 01:30:05 +0100 Michael Everson wrote: > > I'm trying to work out whether we need a variation sequence for > > "chesspiece in a sentence”. > > Of course! Haven’t you ever seen chess problem texts? Check out the > Fairy Chess proposal for encoding

Re: Proposal to add standardized variation sequences for chess notation

2017-04-03 Thread Richard Wordingham
On Mon, 3 Apr 2017 14:12:52 +0200 Michael Everson <ever...@evertype.com> wrote: > On 2 Apr 2017, at 18:27, Richard Wordingham > <richard.wording...@ntlworld.com> wrote: > I think you are seriously going the wrong way with this thinking. The > immediate parallel that comes

Unicode 10.0 Legitimacy of 0031 FE0E 20E3

2017-04-03 Thread Richard Wordingham
Where in the draft databases for Unicode 10.0 is Unicode 9.0 variation sequence declared legitimate? Without such a declaration, a font that had a special glyph for or a substitution specific to would not be

Re: Proposal to add standardized variation sequences for chess notation

2017-04-02 Thread Richard Wordingham
On Sun, 2 Apr 2017 08:19:00 +0200 Michael Everson wrote: > > On 1 Apr 2017, at 23:49, Philippe Verdy wrote: > > > > I think it's all about sizing so that white or black cells will > > align, independantly of the piece that may be within it. > > A

Re: Proposal to add standardized variation sequences for chess notation

2017-04-02 Thread Richard Wordingham
On Sun, 2 Apr 2017 11:57:49 +0200 Michael Everson wrote: > THat’s why in the proposal says things like "set in Ludus in > 24 points with 26-point leading” where relevant. You forgot the most important setting though - that the higher-order protocols allow symbols to be

Re: Proposal to add standardized variation sequences for chess notation

2017-04-02 Thread Richard Wordingham
On Sun, 2 Apr 2017 11:57:49 +0200 Michael Everson <ever...@evertype.com> wrote: > On 2 Apr 2017, at 10:53, Richard Wordingham > <richard.wording...@ntlworld.com> wrote: > > while <U+2657, U+FE01> would be the white bishop on a black square. > > Perha

Re: Coloured Punctuation and Annotation

2017-04-06 Thread Richard Wordingham
On Thu, 6 Apr 2017 11:34:47 +0100 (BST) William_J_G Overington wrote: > The following post may be of interest. > > http://www.unicode.org/mail-arch/unicode-ml/y2002-m06/0337.html > > It is part of a thread from 2002 about the possibility of chromatic > fonts. > > I

Re: Coloured Punctuation and Annotation

2017-04-06 Thread Richard Wordingham
On Thu, 6 Apr 2017 01:19:42 -0400 Rebecca T <637...@gmail.com> wrote: > ... and > aside from usage I see > no difference between U+1F989 OWL 黎 and U+13153 EGYPTIAN HIEROGLYPH > G017 ㅓ. OWL does not have a prescribed attitude. On the other hand, if G017 were not body side on and head face on, I

Re: Coloured Punctuation and Annotation

2017-04-07 Thread Richard Wordingham
On Thu, 6 Apr 2017 13:17:36 -0700 Asmus Freytag wrote: > While it appears possible, after Khaled's demonstration, I still > think that the use of "white ink" instead of the "white" parts of a > character being treated "transparent" is far from standard text > presentation.

Re: Coloured Punctuation and Annotation

2017-04-05 Thread Richard Wordingham
On Thu, 6 Apr 2017 01:11:09 +0100 Michael Everson <ever...@evertype.com> wrote: > On 5 Apr 2017, at 22:48, Richard Wordingham > <richard.wording...@ntlworld.com> wrote: > > > I tried to read it from UTS#51 ‘Unicode Emoji', which is not part > > of TUS,

Re: "A Programmer's Introduction to Unicode"

2017-03-12 Thread Richard Wordingham
On Sun, 12 Mar 2017 20:02:28 +0100 "Janusz S. Bien" wrote: > If the basic notion has to be referred in a cumbersome way as > "extended grapheme cluster" then it is easier to talk about "Unicode > characters" despite the fact that they have a rather loose relation > to

Re: "A Programmer's Introduction to Unicode"

2017-03-13 Thread Richard Wordingham
On Mon, 13 Mar 2017 15:26:00 -0700 Manish Goregaokar wrote: > Do you have examples of AA being split that way (and further reading)? > I think I'm aware of what you're talking about, but would love to read > more about it. Just googling for the three words 'Sanskrit',

Re: "A Programmer's Introduction to Unicode"

2017-03-13 Thread Richard Wordingham
On Mon, 13 Mar 2017 20:20:25 -0400 "Mark E. Shoulson" wrote: > Sanskrit external vowel sandhi is comparatively > straightforward (compared to consonant sandhi), and it frequently > loses information. A *or* AA plus I is E; A *or* AA plus U is O (you > need A + O to get AU).

Re: "A Programmer's Introduction to Unicode"

2017-03-13 Thread Richard Wordingham
On Mon, 13 Mar 2017 19:18:00 + Alastair Houghton wrote: > IMO, returning code points by index is a mistake. It over-emphasises > the importance of the code point, which helps to continue the notion > in some developers’ minds that code points are somehow

Re: "A Programmer's Introduction to Unicode"

2017-03-13 Thread Richard Wordingham
On Mon, 13 Mar 2017 23:10:11 +0200 Khaled Hosny wrote: > But there are many text operations that require access to Unicode code > points. Take for example text layout, as mapping characters to glyphs > and back has to operate on code points. The idea that you never need >

Re: Translations of city names

2017-03-01 Thread Richard Wordingham
On Wed, 1 Mar 2017 12:56:23 -0800 Jean Aurambault wrote: > I'm wondering if there is any standard that defines a universal city > id (similar to country codes). ISO 3166-2 defines codes for some cities, but its uneven. However, what's a city? Does Constantinople

Re: Proposal to add standardized variation sequences for chess notation

2017-04-03 Thread Richard Wordingham
On Mon, 3 Apr 2017 15:07:38 -0700 "Asmus Freytag (c)" wrote: > Having the system use specific character codes for the empties and > variation selectors for the pieces is a needless complication; just > duplicate the few pieces with a hatched background. (The precise >

Re: Proposal to add standardized variation sequences for chess notation

2017-04-03 Thread Richard Wordingham
On Mon, 3 Apr 2017 22:48:31 +0100 Michael Everson wrote: > Yes, this is what I’ve proposed. I was explaining it to Asmus and others with similar misunderstandings. Richard.

Re: Combining Class of Thai Nonspacing_Marks

2017-04-03 Thread Richard Wordingham
On Mon, 3 Apr 2017 14:12:51 +0700 "Gerriet M. Denkmann" wrote: > The Combining Class is used for normalisation of strings. > Normalisation of strings is important for filenames in filesystems. > > As far as I know, a Thai consonant (Lo, Other_Letter) can have > several

Coloured Punctuation and Annotation

2017-04-05 Thread Richard Wordingham
In topic 'Proposal to add standardized variation sequences for chess notation', on Wed, 5 Apr 2017 03:05:16 -0700 Asmus Freytag <asm...@ix.netcom.com> wrote: > On 4/5/2017 1:10 AM, Richard Wordingham wrote: > > A piece with a *white* background is different to a piece tha

Re: Proposal to add standardized variation sequences for chess notation

2017-04-08 Thread Richard Wordingham
On Wed, 5 Apr 2017 14:08:03 +0100 Michael Everson <ever...@evertype.com> wrote: > On 5 Apr 2017, at 04:50, Richard Wordingham > <richard.wording...@ntlworld.com> wrote: > > >> Why would anyone make a font that supports the variants for > >> drawing

Re: Proposal to add standardized variation sequences for chess notation

2017-04-08 Thread Richard Wordingham
On Wed, 5 Apr 2017 20:32:44 +0100 Michael Everson wrote: > On 5 Apr 2017, at 20:13, Philippe Verdy wrote: > Chess characters aren’t emojis. That doesn't mean that solutions applicable to emojis might not be applicable elsewhere. > The logic of the

Re: Proposal to add standardized variation sequences for chess notation

2017-04-08 Thread Richard Wordingham
On Thu, 06 Apr 2017 18:26:39 +0200 Kent Karlsson wrote: > All the characters in the "chess board lines" (apart from spaces, if > any), are of bidi category ON or NSM. So there is no character that > "sets" a bidi direction of the lines ("paragraphs"). So if the bidi >

Re: Proposal to add standardized variation sequences for chess notation

2017-04-08 Thread Richard Wordingham
On Thu, 6 Apr 2017 11:08:43 +0200 (CEST) Christoph Päper <christoph.pae...@crissov.de> wrote: > Richard Wordingham <richard.wording...@ntlworld.com>: > > If the variation selectors are ignored, these simplify to: > > > > white square > > hatched square >

Re: Proposal to add standardized variation sequences for chess notation

2017-04-12 Thread Richard Wordingham via Unicode
On Wed, 12 Apr 2017 06:58:22 +0200 Philippe Verdy via Unicode wrote: > This is the same problem. The same problem as crossword grids where > we need also empty cells (and "black" cells which are equivalent to > an empty cell with a black square symbol instead of letters).

Rationale for IPC of Newa Dependent Vowels

2017-04-17 Thread Richard Wordingham via Unicode
I have doubts about the Indic_Positional_Category (InPC) values proposed for four new dependent vowels being added in Unicode 10.0.0. On examining the vowel chart (p1265 of http://www.unicode.org/Public/10.0.0/charts/CodeCharts.pdf) one may feel quite comfortable with assigning the property

Re: Counting Devanagari Aksharas

2017-04-20 Thread Richard Wordingham via Unicode
On Thu, 20 Apr 2017 11:17:05 -0700 Manish Goregaokar via Unicode wrote: > When given a rendered representation people seem to uniformly count > conjuncts as multiple aksharas if rendered with visible halant, and as > a single akshara if they are rendered conjoined. Now,

Re: Counting Devanagari Aksharas

2017-04-20 Thread Richard Wordingham via Unicode
On Thu, 20 Apr 2017 14:14:00 -0700 Manish Goregaokar via Unicode <unicode@unicode.org> wrote: > On Thu, Apr 20, 2017 at 12:14 PM, Richard Wordingham via Unicode > <unicode@unicode.org> wrote: > > On Thu, 20 Apr 2017 11:17:05 -0700 > > Manish Goregaokar via Unico

Re: Counting Devanagari Aksharas

2017-04-21 Thread Richard Wordingham via Unicode
On Fri, 21 Apr 2017 00:08:24 -0500 Anshuman Pandey via Unicode <unicode@unicode.org> wrote: > > On Apr 20, 2017, at 8:19 PM, Richard Wordingham via Unicode > > <unicode@unicode.org> wrote: > > Now imagine you're > > typing Vedic Sanskrit, with its clusters

Re: Counting Devanagari Aksharas

2017-04-20 Thread Richard Wordingham via Unicode
On Thu, 20 Apr 2017 15:33:37 +0530 Shriramana Sharma via Unicode wrote: > All I can say is that Tamil script has eschewed most consonant cluster > ligatures/conjoining forms. As for Devanagari, writing श्रीमान्‌को (I > used ZWNJ) i.o. श्रीमान्को is quite possible with

Re: Counting Devanagari Aksharas

2017-04-20 Thread Richard Wordingham via Unicode
I was offered the following reply: > To my knowledge except in Tamil script vowel less consonants in > written form aren't considered as separate "akshara"s in native > terminology. Word-finally they seem to be being treated as such. To be more precise, a final cluster of one or more consonants

Counting Devanagari Aksharas

2017-04-19 Thread Richard Wordingham via Unicode
Is there consensus on how to count aksharas in the Devanagari script? The doubts I have relate to a visible halant in orthographic syllables other than the first. For example, according to 'Devanagari VIP Team Issues Report' http://www.unicode.org/L2/L2011/11370-devanagari-vip-issues.pdf, a

Re: Counting Devanagari Aksharas

2017-04-22 Thread Richard Wordingham via Unicode
On Fri, 21 Apr 2017 16:27:43 -0700 Manish Goregaokar via Unicode wrote: > > Do Hindi speakers really think of orthographic syllables as > > characters? > > When rendered as a cluster, yes? I've asked around, and folks seem to > insist on coupling it to the rendering.

Re: Counting Devanagari Aksharas

2017-04-23 Thread Richard Wordingham via Unicode
On Sun, 23 Apr 2017 05:40:29 +0300 Eli Zaretskii via Unicode wrote: > > The cursor moves to the cluster boundary, so there is much less of a > > problem with Emacs. > > But you wanted to highlight only part of the cluster, AFAIU. If I search for CGJ, highlighting it is

Re: Counting Devanagari Aksharas

2017-04-24 Thread Richard Wordingham via Unicode
On Mon, 24 Apr 2017 00:36:26 +0530 Naena Guru via Unicode wrote: > The Unicode approach to Sanskrit and all Indic is flawed. Indic > should not be letter-assembly systems. > > Sanskrit vyaakaraNa (grammar) explains the phonemes as the atoms of > the speech. Each writing

Re: Counting Devanagari Aksharas

2017-04-22 Thread Richard Wordingham via Unicode
On Sat, 22 Apr 2017 13:34:32 +0300 Eli Zaretskii via Unicode wrote: > AFAIR, Emacs allows one to _delete_ individual characters, > i.e. Backspace and C-d delete character-by-character, so the problem > shouldn't be so grave for imperfect typists. Deleting forwards by one

Re: Counting Devanagari Aksharas

2017-04-22 Thread Richard Wordingham via Unicode
On Sat, 22 Apr 2017 21:39:42 +0100 (BST) Julian Bradfield via Unicode wrote: > On 2017-04-22, Eli Zaretskii via Unicode wrote: > > I could imagine Emacs decomposing characters temporarily when only > > part of a cluster matches the search string.

Re: Unicode education in UK Schools

2017-07-08 Thread Richard Wordingham via Unicode
On Sat, 8 Jul 2017 09:04:39 -0700 Asmus Freytag via Unicode wrote: > But some handling > of combining mark (and also the new emoji sequences) would equally > constitute "basic" knowledge, with the Unicode algorithms like > sorting, Which major applications actually use the

Re: LATIN CAPITAL LETTER SHARP S officially recognized

2017-07-02 Thread Richard Wordingham via Unicode
On Sat, 01 Jul 2017 09:51:00 +0300 "a.lukyanov via Unicode" wrote: > Is it possible to design fonts that will render ẞ as SS? > > So we could choose between ẞ and SS by just selecting the proper > font, without changing the text itself. > > Or perhaps there will be a "font

Re: Counting Devanagari Aksharas

2017-04-26 Thread Richard Wordingham via Unicode
On Wed, 26 Apr 2017 08:48:13 +0300 Eli Zaretskii via Unicode <unicode@unicode.org> wrote: > > Date: Sun, 23 Apr 2017 22:59:49 +0100 > > From: Richard Wordingham <richard.wording...@ntlworld.com> > > Cc: Eli Zaretskii <e...@gnu.org> > > > > If

Re: Counting Devanagari Aksharas

2017-04-21 Thread Richard Wordingham via Unicode
On Thu, 20 Apr 2017 11:17:05 -0700 Manish Goregaokar via Unicode <unicode@unicode.org> wrote: > On Wed, Apr 19, 2017 at 4:35 PM, Richard Wordingham via Unicode > <unicode@unicode.org> wrote: > > Is there consensus on how to count aksharas in the Devanagari > > sc

Re: Go romanize! Re: Counting Devanagari Aksharas

2017-04-24 Thread Richard Wordingham via Unicode
On Mon, 24 Apr 2017 20:53:12 +0530 Naena Guru via Unicode wrote: > Quote by Richard: > Unless this implies a spelling reform for many languages, I'd like to > see how this works for the Tai Tham script. I'm not happy with the > Romanisation I use to work round hostile

Re: Tibetan Paluta

2017-04-27 Thread Richard Wordingham via Unicode
On Thu, 27 Apr 2017 13:57:55 +0530 Srinidhi A via Unicode wrote: > The annotation of 0F85 ྅ TIBETAN MARK PALUTA says it is used for > avagraha. However it seems this character denotes pluta instead of > avagraha. Pluta is used for indicating elongation of vowel. > Similar

Re: Turtle Graphics Emoji

2017-07-29 Thread Richard Wordingham via Unicode
On Fri, 28 Jul 2017 13:22:22 +0100 (BST) William_J_G Overington via Unicode wrote: > I have been thinking about having Turtle Graphics Emoji as an > educational and fun idea. I trust you are aware of the widespread feeling that there is already an excessive number of turtle

Re: Version linking?

2017-08-17 Thread Richard Wordingham via Unicode
On Thu, 17 Aug 2017 18:34:56 +0530 Shriramana Sharma via Unicode wrote: > Thanks for your reply, but how can characters be used portably if they > are not part of the published standard yet? Or is it that hereafter > both Unicode Standard + Unicode Emoji Standard will be

Re: How to Add Beams to Notes

2017-05-01 Thread Richard Wordingham via Unicode
On Mon, 1 May 2017 23:03:53 + Michael Bear via Unicode wrote: > “Rather than using "unused code positions", I would always recommend > to use some of the Private Use code points.” Consider it done. > > “What is the intended usage of your font? Music

Re: Are Emoji ZWJ sequences characters?

2017-05-15 Thread Richard Wordingham via Unicode
On Mon, 15 May 2017 16:14:23 + Peter Constable via Unicode wrote: > So, your helpful person was, indeed, helpful, giving you correct > information: ZWJ sequences are not _characters_ and have no > implications for ISO/IEC 10646. Except in so far as the claimed ligature

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-17 Thread Richard Wordingham via Unicode
On Wed, 17 May 2017 13:37:51 -0700 Doug Ewell via Unicode <unicode@unicode.org> wrote: > Richard Wordingham wrote: > > >> It is not at all clear what the intent of the encoder was - or even > >> if it's not just a problem with the data stream. E0 80 80 is no

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-17 Thread Richard Wordingham via Unicode
On Wed, 17 May 2017 15:31:56 -0700 Doug Ewell via Unicode <unicode@unicode.org> wrote: > Richard Wordingham wrote: > > > So it was still a legal way for a non-UTF-8-compliant process! > > Anything is possible if you are non-compliant. You can encode U+263A > w

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-17 Thread Richard Wordingham via Unicode
On Wed, 17 May 2017 13:41:56 -0700 Doug Ewell via Unicode wrote: > Perhaps surprisingly, it's already too late. UTC approved this change > the day after the proposal was written. > > http://www.unicode.org/L2/L2017/17103.htm#151-C19 Approved for Unicode 11.0. Unicode 10.0

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-17 Thread Richard Wordingham via Unicode
On Thu, 18 May 2017 02:04:55 +0200 Philippe Verdy via Unicode wrote: > I find intriguating that the update intends to enforce the decoding > of the **shortest** sequences, but now wants to treat **maximal > sequences** as a single unit with arbitrary length. UTF-8 was >

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Richard Wordingham via Unicode
On Tue, 16 May 2017 20:08:52 +0900 "Martin J. Dürst via Unicode" wrote: > I agree with others that ICU should not be considered to have a > special status, it should be just one implementation among others. > [The next point is a side issue, please don't spend too much time

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Richard Wordingham via Unicode
On Tue, 16 May 2017 17:30:01 + Shawn Steele via Unicode wrote: > > Would you advocate replacing > > > e0 80 80 > > > with > > > U+FFFD U+FFFD U+FFFD (1) > > > rather than > > > U+FFFD (2) > > > It’s pretty clear what the

Fighting Spell-Checking by Renderers

2017-05-14 Thread Richard Wordingham via Unicode
One of the early problems encountered with Unicode was that there can be multiple ways of representing the same text. For many scripts, the solution was canonical equivalence - the multiple ways were declared to be equivalent, and anything that thought they had different meanings and should

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Richard Wordingham via Unicode
On Tue, 16 May 2017 10:01:03 +0300 Henri Sivonen via Unicode wrote: > Even so, I think even changing a recommendation of "best practice" > needs way better rationale than "feels right" or "ICU already does it" > when a) major browsers (which operate in the most prominent >

Comparing Raw Values of the Age Property

2017-05-22 Thread Richard Wordingham via Unicode
Given two raw values of the Age property, defined in UCD file DerivedAge.txt, how is a computer program supposed to compare them? Apart from special handling for the value "Unassigned" and its short alias "NA", one used to be able to compare short values against short values and long values

Re: Comparing Raw Values of the Age Property

2017-05-22 Thread Richard Wordingham via Unicode
On Mon, 22 May 2017 15:10:02 -0700 Markus Scherer via Unicode <unicode@unicode.org> wrote: > On Mon, May 22, 2017 at 2:44 PM, Richard Wordingham via Unicode < > unicode@unicode.org> wrote: > > > Given two raw values of the Age property, defined in UCD file > >

Re: Comparing Raw Values of the Age Property

2017-05-22 Thread Richard Wordingham via Unicode
On Mon, 22 May 2017 17:19:08 -0500 Anshuman Pandey wrote: > I performed several operations on DerivedAge.txt a few months ago. > One basic example here: > > https://pandey.github.io/posts/unicode-growth-UCD-python.html So what happens if you apply it to Unicode Version 10.0?

Re: Comparing Raw Values of the Age Property

2017-05-23 Thread Richard Wordingham via Unicode
On Tue, 23 May 2017 05:29:33 -0700 Asmus Freytag via Unicode wrote: > On 5/23/2017 4:04 AM, Janusz S. Bien via Unicode wrote: > > Quote/Cytat - Manuel Strehl via Unicode (Tue > > 23 May 2017 11:33:24 AM CEST): > > > >> The rising standard in the world

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-18 Thread Richard Wordingham via Unicode
On Thu, 18 May 2017 09:58:43 +0100 Alastair Houghton via Unicode wrote: > On 18 May 2017, at 07:18, Henri Sivonen via Unicode > wrote: > > > > the decision complicates U+FFFD generation when validating UTF-8 by > > state machine. > > It *really*

Re: Comparing Raw Values of the Age Property

2017-05-23 Thread Richard Wordingham via Unicode
On Tue, 23 May 2017 17:44:49 -0700 Ken Whistler via Unicode wrote: > Ah, but keep in mind, if projecting out to Version 23.0 (in the year > 2030, by our current schedule), there is a significant chance that > particular UCD data files may have morphed into something

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Richard Wordingham via Unicode
On Tue, 16 May 2017 14:44:44 +0200 Hans Åberg via Unicode wrote: > > On 15 May 2017, at 12:21, Henri Sivonen via Unicode > > wrote: > ... > > I think Unicode should not adopt the proposed change. > > It would be useful, for use with filesystems, to

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Richard Wordingham via Unicode
On Tue, 16 May 2017 11:36:39 -0700 Markus Scherer via Unicode wrote: > Why do we care how we carve up an illegal sequence into subsequences? > Only for debugging and visual inspection. Maybe some process is using > illegal, overlong sequences to encode something special (à

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-30 Thread Richard Wordingham via Unicode
On Fri, 26 May 2017 11:22:37 -0700 Ken Whistler via Unicode wrote: > On 5/26/2017 10:28 AM, Karl Williamson via Unicode wrote: > > The link provided about the PRI doesn't lead to the comments. > > > > PRI #121 (August, 2008) pre-dated the practice of keeping all the >

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-31 Thread Richard Wordingham via Unicode
On Fri, 26 May 2017 21:41:49 + Shawn Steele via Unicode wrote: > I totally get the forward/backward scanning in sync without decoding > reasoning for some implementations, however I do not think that the > practices that benefit those should extend to other applications

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-30 Thread Richard Wordingham via Unicode
On Tue, 30 May 2017 16:38:45 -0600 Karl Williamson via Unicode wrote: > Under Best Practices, how many REPLACEMENT CHARACTERs should the > sequence generate? 0, 1, 2, 3, 4 ? > > In practice, how many do parsers generate? See Markus Kuhn's test page

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Richard Wordingham via Unicode
On Thu, 1 Jun 2017 17:10:54 -0700 Ken Whistler via Unicode wrote: > Well, working from the *current* specification: > > FC 80 80 80 80 80 > and > FF FF FF FF FF FF > > are equal trash, uninterpretable as *anything* in UTF-8. > > By definition D39b, either sequence of

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Richard Wordingham via Unicode
On Thu, 1 Jun 2017 19:19:51 -0700 Ken Whistler via Unicode wrote: > > and therefore should start a > > sequence of 6 characters. > > That is completely false, and has nothing to do with the current > definition of UTF-8. > > The current, normative definition of UTF-8,

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Richard Wordingham via Unicode
On Thu, 1 Jun 2017 17:10:54 -0700 Ken Whistler via Unicode <unicode@unicode.org> wrote: > On 6/1/2017 2:39 PM, Richard Wordingham via Unicode wrote: > > You were implicitly invited to argue that there was no need to > > handle 5 and 6 byte invalid sequences. > >

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-05 Thread Richard Wordingham via Unicode
On Mon, 5 Jun 2017 13:08:06 +0900 "Martin J. Dürst via Unicode" <unicode@unicode.org> wrote: > On 2017/06/02 04:54, Doug Ewell via Unicode wrote: > > Richard Wordingham wrote: > > > >> even supporting 6-byte patterns just in case 20.1 bits e

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-31 Thread Richard Wordingham via Unicode
On Wed, 31 May 2017 19:24:04 + Shawn Steele via Unicode wrote: > It seems to me that being able to use a data stream of ambiguous > quality in another application with predictable results, then that > stream should be “repaired” prior to being handed over. Then both >

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-06-01 Thread Richard Wordingham via Unicode
On Thu, 1 Jun 2017 12:32:08 +0300 Henri Sivonen via Unicode <unicode@unicode.org> wrote: > On Wed, May 31, 2017 at 8:11 PM, Richard Wordingham via Unicode > <unicode@unicode.org> wrote: > > On Wed, 31 May 2017 15:12:12 +0300 > > Henri Sivonen via Unicode <unicod

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Richard Wordingham via Unicode
On Thu, 01 Jun 2017 12:54:45 -0700 Doug Ewell via Unicode <unicode@unicode.org> wrote: > Richard Wordingham wrote: > > > even supporting 6-byte patterns just in case 20.1 bits eventually > > turn out not to be enough, > > Oh, gosh, here we go with this. You we

Re: How to Add Beams to Notes

2017-05-04 Thread Richard Wordingham via Unicode
On Thu, 4 May 2017 05:01:17 +0200 Philippe Verdy via Unicode wrote: > Rendering Devanagari with OpenType does not require any PUA > assignment in that font for variants. The sequences are mapped > directly using subtables and the rules defined in OpenType for that > script.

Re: Sutton SignWriting PDF

2017-05-07 Thread Richard Wordingham via Unicode
On Sat, 6 May 2017 12:54:07 + Michael Bear via Unicode wrote: > If I open the Sutton SignWriting code chart in Mozilla Firefox, the > glyphs in the tables are blank. I have no idea why. If I open it in > Microsoft Edge, however, it works fine. Do you know why this is?

Re: How to Add Beams to Notes

2017-05-07 Thread Richard Wordingham via Unicode
On Fri, 5 May 2017 18:46:17 + Michael Bear via Unicode wrote: > But > if the cry for space gets REALLY desperate, I’ll merge identical > glyphs into one glyph. Obviously, I won’t do this for more > problematic merges, only glyphs in similar scripts with similar >

Re: abstract characters, semantics, meaningful transformations ... Was: Tibetan Paluta

2017-05-01 Thread Richard Wordingham via Unicode
On Mon, 1 May 2017 19:49:27 +0530 Naena Guru via Unicode wrote: > The purpose of writing is to represent speech. It is not some secret > that demi-gods created Sarasvati and Thoth would be offended at being called mere demi-gods. > sound => letter that is the basis for

Unicode is more than shapes (was: Tibetan Paluta)

2017-05-01 Thread Richard Wordingham via Unicode
On Mon, 1 May 2017 07:17:05 +0200 Philippe Verdy via Unicode wrote: > 2017-04-29 21:21 GMT+02:00 Naena Guru via Unicode > : > > Anyway, Unicode is only about DISPLAYING a script: There's a shape > > here; Let's find how to get it by assembling other

Re: How to Add Beams to Notes

2017-05-03 Thread Richard Wordingham via Unicode
On Tue, 2 May 2017 05:08:27 +0200 Philippe Verdy via Unicode wrote: > Consider also that the BMP is almost full, the remaining few holes > are kept for isolated characters that may be added to existing > scripts, or permanently reserved to avoid clashes with legacy >

Re: How to Add Beams to Notes

2017-05-04 Thread Richard Wordingham via Unicode
On Thu, 4 May 2017 23:13:08 + Michael Bear via Unicode wrote: > I plan to do everything in the plane EXCEPT for the surrogates, which > you’re not supposed to encode in fonts anyway, which leaves room for > about 2,048 more glyphs for OpenType features. There are, if I

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-31 Thread Richard Wordingham via Unicode
On Wed, 31 May 2017 15:12:12 +0300 Henri Sivonen via Unicode wrote: > The write-up mentions > https://bugs.chromium.org/p/chromium/issues/detail?id=662822#c13 . I'd > like to draw everyone's attention to that bug, which is real-world > evidence of a bug arising from two

Western numeral diacritics in complex scripts

2017-09-17 Thread Richard Wordingham via Unicode
In philological work, one encounters the problem that two or more abstract characters have only same 'natural' transliteration; the same problem can apply to reconstructed phonemes, where there is no sound indication of the actual pronunciation. A common solution is to use a subscript or

Normalise Tai Tham or not?

2017-10-10 Thread Richard Wordingham via Unicode
I'm preparing to share a spell-checker for Northern Thai in the Tai Tham script, and I'm having difficulty deciding whether to offer corrections in NFC/NFD or unnormalised. The problem arises in closed syllables with tone marks. For example, ᨠᩥ᩠᩵ᨶ /kin/ 'smell', has two canonically equivalent

Re: Normalise Tai Tham or not?

2017-10-10 Thread Richard Wordingham via Unicode
On Tue, 10 Oct 2017 22:46:20 +0300 Eli Zaretskii via Unicode <unicode@unicode.org> wrote: > > Date: Tue, 10 Oct 2017 20:00:12 +0100 > > From: Richard Wordingham via Unicode <unicode@unicode.org> > > > > 4) The pressure on search tools to respect canonical e

Re: Unicode education in Schools

2017-08-24 Thread Richard Wordingham via Unicode
On Thu, 24 Aug 2017 17:17:10 + Andre Schappo via Unicode wrote: > So, I consider it important to familiarise students with SMP > characters as well as BMP characters. Then when they develop software > they will, at the start, be thinking beyond ASCII and Unicode BMP >

Re: Unicode education in Schools

2017-08-26 Thread Richard Wordingham via Unicode
On Sat, 26 Aug 2017 18:55:25 +0300 Eli Zaretskii via Unicode <unicode@unicode.org> wrote: > > Date: Sat, 26 Aug 2017 16:09:33 +0100 > > From: Richard Wordingham via Unicode <unicode@unicode.org> > > It shouldn't. UTF-16 works just like UTF-8, except that

Re: Unicode education in Schools

2017-08-26 Thread Richard Wordingham via Unicode
On Fri, 25 Aug 2017 12:57:37 +0100 (BST) William_J_G Overington via Unicode wrote: > UTF-16 is very useful. I use it in my research project. > If the byte content of a UTF-16 file is displayed in a hexadecimal > display then for all plane 0 characters the byte content of

Re: Unicode education in Schools

2017-08-26 Thread Richard Wordingham via Unicode
On Fri, 25 Aug 2017 09:36:00 +0300 Eli Zaretskii via Unicode <unicode@unicode.org> wrote: > > Date: Fri, 25 Aug 2017 00:23:40 +0100 > > From: Richard Wordingham via Unicode <unicode@unicode.org> > > > > On Thu, 24 Aug 2017 17:17:10 + > > Andre Schap

Character Sequences of Uncertain Rendering (was: Version linking?)

2017-08-26 Thread Richard Wordingham via Unicode
On Fri, 25 Aug 2017 01:24:36 +0200 Philippe Verdy via Unicode <unicode@unicode.org> wrote: > 2017-08-17 22:37 GMT+02:00 Richard Wordingham via Unicode < > unicode@unicode.org>: > > > Fortunately, there is no good evidence that the occurrence > > of multiple

Re: Unicode education in Schools

2017-08-26 Thread Richard Wordingham via Unicode
On Sat, 26 Aug 2017 21:20:45 +0300 Eli Zaretskii via Unicode <unicode@unicode.org> wrote: > > Date: Sat, 26 Aug 2017 18:52:03 +0100 > > From: Richard Wordingham via Unicode <unicode@unicode.org> > We are miscommunicating. My point was that programming for MS-Windows &

<    2   3   4   5   6   7   8   9   10   11   >