Re: Proposal to add standardized variation sequences for chess notation

2017-04-12 Thread Richard Wordingham via Unicode
On Wed, 12 Apr 2017 06:58:22 +0200 Philippe Verdy via Unicode wrote: > This is the same problem. The same problem as crossword grids where > we need also empty cells (and "black" cells which are equivalent to > an empty cell with a black square symbol instead of letters). And black cells with no

Rationale for IPC of Newa Dependent Vowels

2017-04-17 Thread Richard Wordingham via Unicode
I have doubts about the Indic_Positional_Category (InPC) values proposed for four new dependent vowels being added in Unicode 10.0.0. On examining the vowel chart (p1265 of http://www.unicode.org/Public/10.0.0/charts/CodeCharts.pdf) one may feel quite comfortable with assigning the property values

Counting Devanagari Aksharas

2017-04-19 Thread Richard Wordingham via Unicode
Is there consensus on how to count aksharas in the Devanagari script? The doubts I have relate to a visible halant in orthographic syllables other than the first. For example, according to 'Devanagari VIP Team Issues Report' http://www.unicode.org/L2/L2011/11370-devanagari-vip-issues.pdf, a derive

Re: Counting Devanagari Aksharas

2017-04-20 Thread Richard Wordingham via Unicode
I was offered the following reply: > To my knowledge except in Tamil script vowel less consonants in > written form aren't considered as separate "akshara"s in native > terminology. Word-finally they seem to be being treated as such. To be more precise, a final cluster of one or more consonants

Re: Counting Devanagari Aksharas

2017-04-20 Thread Richard Wordingham via Unicode
On Thu, 20 Apr 2017 15:33:37 +0530 Shriramana Sharma via Unicode wrote: > All I can say is that Tamil script has eschewed most consonant cluster > ligatures/conjoining forms. As for Devanagari, writing श्रीमान्‌को (I > used ZWNJ) i.o. श्रीमान्को is quite possible with existing technology. > The

Re: Counting Devanagari Aksharas

2017-04-20 Thread Richard Wordingham via Unicode
On Thu, 20 Apr 2017 11:17:05 -0700 Manish Goregaokar via Unicode wrote: > When given a rendered representation people seem to uniformly count > conjuncts as multiple aksharas if rendered with visible halant, and as > a single akshara if they are rendered conjoined. Now, that's what I expected.

Re: Counting Devanagari Aksharas

2017-04-20 Thread Richard Wordingham via Unicode
On Thu, 20 Apr 2017 14:14:00 -0700 Manish Goregaokar via Unicode wrote: > On Thu, Apr 20, 2017 at 12:14 PM, Richard Wordingham via Unicode > wrote: > > On Thu, 20 Apr 2017 11:17:05 -0700 > > Manish Goregaokar via Unicode wrote: > >> I'm of the opinion that

Re: Counting Devanagari Aksharas

2017-04-21 Thread Richard Wordingham via Unicode
On Fri, 21 Apr 2017 00:08:24 -0500 Anshuman Pandey via Unicode wrote: > > On Apr 20, 2017, at 8:19 PM, Richard Wordingham via Unicode > > wrote: > > Now imagine you're > > typing Vedic Sanskrit, with its clusters and pitch indicators. > I tried typing Vedi

Re: Counting Devanagari Aksharas

2017-04-21 Thread Richard Wordingham via Unicode
On Thu, 20 Apr 2017 11:17:05 -0700 Manish Goregaokar via Unicode wrote: > On Wed, Apr 19, 2017 at 4:35 PM, Richard Wordingham via Unicode > wrote: > > Is there consensus on how to count aksharas in the Devanagari > > script? The doubts I have relate to a visible halant i

Re: Counting Devanagari Aksharas

2017-04-22 Thread Richard Wordingham via Unicode
On Fri, 21 Apr 2017 16:27:43 -0700 Manish Goregaokar via Unicode wrote: > > Do Hindi speakers really think of orthographic syllables as > > characters? > > When rendered as a cluster, yes? I've asked around, and folks seem to > insist on coupling it to the rendering. That argues that it's a u

Re: Counting Devanagari Aksharas

2017-04-22 Thread Richard Wordingham via Unicode
On Sat, 22 Apr 2017 13:34:32 +0300 Eli Zaretskii via Unicode wrote: > AFAIR, Emacs allows one to _delete_ individual characters, > i.e. Backspace and C-d delete character-by-character, so the problem > shouldn't be so grave for imperfect typists. Deleting forwards by one _character_ certainly ma

Re: Counting Devanagari Aksharas

2017-04-22 Thread Richard Wordingham via Unicode
On Sat, 22 Apr 2017 21:39:42 +0100 (BST) Julian Bradfield via Unicode wrote: > On 2017-04-22, Eli Zaretskii via Unicode wrote: > > I could imagine Emacs decomposing characters temporarily when only > > part of a cluster matches the search string. Assuming this would > > make sense to users of

Re: Counting Devanagari Aksharas

2017-04-23 Thread Richard Wordingham via Unicode
On Sun, 23 Apr 2017 05:40:29 +0300 Eli Zaretskii via Unicode wrote: > > The cursor moves to the cluster boundary, so there is much less of a > > problem with Emacs. > > But you wanted to highlight only part of the cluster, AFAIU. If I search for CGJ, highlighting it is frequently supremely us

Re: Counting Devanagari Aksharas

2017-04-24 Thread Richard Wordingham via Unicode
On Mon, 24 Apr 2017 00:36:26 +0530 Naena Guru via Unicode wrote: > The Unicode approach to Sanskrit and all Indic is flawed. Indic > should not be letter-assembly systems. > > Sanskrit vyaakaraNa (grammar) explains the phonemes as the atoms of > the speech. Each writing system then assigns a sha

Re: Go romanize! Re: Counting Devanagari Aksharas

2017-04-24 Thread Richard Wordingham via Unicode
On Mon, 24 Apr 2017 20:53:12 +0530 Naena Guru via Unicode wrote: > Quote by Richard: > Unless this implies a spelling reform for many languages, I'd like to > see how this works for the Tai Tham script. I'm not happy with the > Romanisation I use to work round hostile rendering engines. (My > s

Re: Counting Devanagari Aksharas

2017-04-25 Thread Richard Wordingham via Unicode
On Wed, 26 Apr 2017 08:48:13 +0300 Eli Zaretskii via Unicode wrote: > > Date: Sun, 23 Apr 2017 22:59:49 +0100 > > From: Richard Wordingham > > Cc: Eli Zaretskii > > > > If I search for CGJ, highlighting it is frequently supremely > > useless. I want to know where it is; highlighting is merely

Re: Tibetan Paluta

2017-04-27 Thread Richard Wordingham via Unicode
On Thu, 27 Apr 2017 13:57:55 +0530 Srinidhi A via Unicode wrote: > The annotation of 0F85 ྅ TIBETAN MARK PALUTA says it is used for > avagraha. However it seems this character denotes pluta instead of > avagraha. Pluta is used for indicating elongation of vowel. > Similar character with identical

Unicode is more than shapes (was: Tibetan Paluta)

2017-05-01 Thread Richard Wordingham via Unicode
On Mon, 1 May 2017 07:17:05 +0200 Philippe Verdy via Unicode wrote: > 2017-04-29 21:21 GMT+02:00 Naena Guru via Unicode > : > > Anyway, Unicode is only about DISPLAYING a script: There's a shape > > here; Let's find how to get it by assembling other shapes or by > > creating a code point for it.

Re: abstract characters, semantics, meaningful transformations ... Was: Tibetan Paluta

2017-05-01 Thread Richard Wordingham via Unicode
On Mon, 1 May 2017 19:49:27 +0530 Naena Guru via Unicode wrote: > The purpose of writing is to represent speech. It is not some secret > that demi-gods created Sarasvati and Thoth would be offended at being called mere demi-gods. > sound => letter that is the basis for writing. "=>" is not a

Re: How to Add Beams to Notes

2017-05-01 Thread Richard Wordingham via Unicode
On Mon, 1 May 2017 23:03:53 + Michael Bear via Unicode wrote: > “Rather than using "unused code positions", I would always recommend > to use some of the Private Use code points.” Consider it done. > > “What is the intended usage of your font? Music score > applications? othe

Re: How to Add Beams to Notes

2017-05-03 Thread Richard Wordingham via Unicode
On Tue, 2 May 2017 05:08:27 +0200 Philippe Verdy via Unicode wrote: > Consider also that the BMP is almost full, the remaining few holes > are kept for isolated characters that may be added to existing > scripts, or permanently reserved to avoid clashes with legacy > softwares using simple code r

Re: How to Add Beams to Notes

2017-05-04 Thread Richard Wordingham via Unicode
On Thu, 4 May 2017 05:01:17 +0200 Philippe Verdy via Unicode wrote: > Rendering Devanagari with OpenType does not require any PUA > assignment in that font for variants. The sequences are mapped > directly using subtables and the rules defined in OpenType for that > script. Fonts just use their o

Re: How to Add Beams to Notes

2017-05-04 Thread Richard Wordingham via Unicode
On Thu, 4 May 2017 23:13:08 + Michael Bear via Unicode wrote: > I plan to do everything in the plane EXCEPT for the surrogates, which > you’re not supposed to encode in fonts anyway, which leaves room for > about 2,048 more glyphs for OpenType features. There are, if I avoided double countin

Re: Sutton SignWriting PDF

2017-05-07 Thread Richard Wordingham via Unicode
On Sat, 6 May 2017 12:54:07 + Michael Bear via Unicode wrote: > If I open the Sutton SignWriting code chart in Mozilla Firefox, the > glyphs in the tables are blank. I have no idea why. If I open it in > Microsoft Edge, however, it works fine. Do you know why this is? It smacks of being a fa

Re: How to Add Beams to Notes

2017-05-07 Thread Richard Wordingham via Unicode
On Fri, 5 May 2017 18:46:17 + Michael Bear via Unicode wrote: > But > if the cry for space gets REALLY desperate, I’ll merge identical > glyphs into one glyph. Obviously, I won’t do this for more > problematic merges, only glyphs in similar scripts with similar > features. (e.g. I would repre

Fighting Spell-Checking by Renderers

2017-05-13 Thread Richard Wordingham via Unicode
One of the early problems encountered with Unicode was that there can be multiple ways of representing the same text. For many scripts, the solution was canonical equivalence - the multiple ways were declared to be equivalent, and anything that thought they had different meanings and should *there

Re: Are Emoji ZWJ sequences characters?

2017-05-15 Thread Richard Wordingham via Unicode
On Mon, 15 May 2017 16:14:23 + Peter Constable via Unicode wrote: > So, your helpful person was, indeed, helpful, giving you correct > information: ZWJ sequences are not _characters_ and have no > implications for ISO/IEC 10646. Except in so far as the claimed ligature changes the meaning of

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-15 Thread Richard Wordingham via Unicode
On Mon, 15 May 2017 21:38:26 + David Starner via Unicode wrote: > > and the fact is that handling surrogates (which is what proponents > > of UTF-8 or UCS-4 usually focus on) is no more complicated than > > handling combining characters, which you have to do anyway. > Not necessarily; you ca

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Richard Wordingham via Unicode
On Tue, 16 May 2017 10:01:03 +0300 Henri Sivonen via Unicode wrote: > Even so, I think even changing a recommendation of "best practice" > needs way better rationale than "feels right" or "ICU already does it" > when a) major browsers (which operate in the most prominent > environment of broken a

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Richard Wordingham via Unicode
On Tue, 16 May 2017 20:08:52 +0900 "Martin J. Dürst via Unicode" wrote: > I agree with others that ICU should not be considered to have a > special status, it should be just one implementation among others. > [The next point is a side issue, please don't spend too much time on > it.] I find it

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Richard Wordingham via Unicode
On Tue, 16 May 2017 14:44:44 +0200 Hans Åberg via Unicode wrote: > > On 15 May 2017, at 12:21, Henri Sivonen via Unicode > > wrote: > ... > > I think Unicode should not adopt the proposed change. > > It would be useful, for use with filesystems, to have Unicode > codepoint markers that indi

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Richard Wordingham via Unicode
On Tue, 16 May 2017 17:30:01 + Shawn Steele via Unicode wrote: > > Would you advocate replacing > > > e0 80 80 > > > with > > > U+FFFD U+FFFD U+FFFD (1) > > > rather than > > > U+FFFD (2) > > > It’s pretty clear what the intent of the encoder was

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Richard Wordingham via Unicode
On Tue, 16 May 2017 11:36:39 -0700 Markus Scherer via Unicode wrote: > Why do we care how we carve up an illegal sequence into subsequences? > Only for debugging and visual inspection. Maybe some process is using > illegal, overlong sequences to encode something special (à la Java > string serial

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-17 Thread Richard Wordingham via Unicode
On Wed, 17 May 2017 13:41:56 -0700 Doug Ewell via Unicode wrote: > Perhaps surprisingly, it's already too late. UTC approved this change > the day after the proposal was written. > > http://www.unicode.org/L2/L2017/17103.htm#151-C19 Approved for Unicode 11.0. Unicode 10.0 has yet to be release

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-17 Thread Richard Wordingham via Unicode
On Wed, 17 May 2017 13:37:51 -0700 Doug Ewell via Unicode wrote: > Richard Wordingham wrote: > > >> It is not at all clear what the intent of the encoder was - or even > >> if it's not just a problem with the data stream. E0 80 80 is not > >> permitted, it's garbage. An encoder can't "intend" it

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-17 Thread Richard Wordingham via Unicode
On Wed, 17 May 2017 15:31:56 -0700 Doug Ewell via Unicode wrote: > Richard Wordingham wrote: > > > So it was still a legal way for a non-UTF-8-compliant process! > > Anything is possible if you are non-compliant. You can encode U+263A > with 9,786 FF bytes followed by a terminating FE byte an

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-17 Thread Richard Wordingham via Unicode
On Thu, 18 May 2017 02:04:55 +0200 Philippe Verdy via Unicode wrote: > I find intriguating that the update intends to enforce the decoding > of the **shortest** sequences, but now wants to treat **maximal > sequences** as a single unit with arbitrary length. UTF-8 was > designed to work only with

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-18 Thread Richard Wordingham via Unicode
On Thu, 18 May 2017 09:58:43 +0100 Alastair Houghton via Unicode wrote: > On 18 May 2017, at 07:18, Henri Sivonen via Unicode > wrote: > > > > the decision complicates U+FFFD generation when validating UTF-8 by > > state machine. > > It *really* doesn’t. Even if you’re hell bent on using a

Comparing Raw Values of the Age Property

2017-05-22 Thread Richard Wordingham via Unicode
Given two raw values of the Age property, defined in UCD file DerivedAge.txt, how is a computer program supposed to compare them? Apart from special handling for the value "Unassigned" and its short alias "NA", one used to be able to compare short values against short values and long values against

Re: Comparing Raw Values of the Age Property

2017-05-22 Thread Richard Wordingham via Unicode
On Mon, 22 May 2017 15:10:02 -0700 Markus Scherer via Unicode wrote: > On Mon, May 22, 2017 at 2:44 PM, Richard Wordingham via Unicode < > unicode@unicode.org> wrote: > > > Given two raw values of the Age property, defined in UCD file > > DerivedAge.txt, how is a c

Re: Comparing Raw Values of the Age Property

2017-05-22 Thread Richard Wordingham via Unicode
On Mon, 22 May 2017 17:19:08 -0500 Anshuman Pandey wrote: > I performed several operations on DerivedAge.txt a few months ago. > One basic example here: > > https://pandey.github.io/posts/unicode-growth-UCD-python.html So what happens if you apply it to Unicode Version 10.0? Are the versions s

Re: Comparing Raw Values of the Age Property

2017-05-23 Thread Richard Wordingham via Unicode
On Tue, 23 May 2017 05:29:33 -0700 Asmus Freytag via Unicode wrote: > On 5/23/2017 4:04 AM, Janusz S. Bien via Unicode wrote: > > Quote/Cytat - Manuel Strehl via Unicode (Tue > > 23 May 2017 11:33:24 AM CEST): > > > >> The rising standard in the world of web development (and others) > >> is ca

Re: Comparing Raw Values of the Age Property

2017-05-23 Thread Richard Wordingham via Unicode
On Tue, 23 May 2017 17:44:49 -0700 Ken Whistler via Unicode wrote: > Ah, but keep in mind, if projecting out to Version 23.0 (in the year > 2030, by our current schedule), there is a significant chance that > particular UCD data files may have morphed into something entirely > different. Reca

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-30 Thread Richard Wordingham via Unicode
On Fri, 26 May 2017 11:22:37 -0700 Ken Whistler via Unicode wrote: > On 5/26/2017 10:28 AM, Karl Williamson via Unicode wrote: > > The link provided about the PRI doesn't lead to the comments. > > > > PRI #121 (August, 2008) pre-dated the practice of keeping all the > feedback comments togeth

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-30 Thread Richard Wordingham via Unicode
On Tue, 30 May 2017 16:38:45 -0600 Karl Williamson via Unicode wrote: > Under Best Practices, how many REPLACEMENT CHARACTERs should the > sequence generate? 0, 1, 2, 3, 4 ? > > In practice, how many do parsers generate? See Markus Kuhn's test page http://www.cl.cam.ac.uk/~mgk25/ucs/examples

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-30 Thread Richard Wordingham via Unicode
On Fri, 26 May 2017 21:41:49 + Shawn Steele via Unicode wrote: > I totally get the forward/backward scanning in sync without decoding > reasoning for some implementations, however I do not think that the > practices that benefit those should extend to other applications that > are happy with

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-31 Thread Richard Wordingham via Unicode
On Wed, 31 May 2017 15:12:12 +0300 Henri Sivonen via Unicode wrote: > The write-up mentions > https://bugs.chromium.org/p/chromium/issues/detail?id=662822#c13 . I'd > like to draw everyone's attention to that bug, which is real-world > evidence of a bug arising from two UTF-8 decoders within one

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-31 Thread Richard Wordingham via Unicode
On Wed, 31 May 2017 17:43:08 + Shawn Steele via Unicode wrote: > There also appears to be a special weight given to > non-minimally-encoded sequences. It would seem to me that none of > these illegal sequences should appear in practice, so we have either: > I do not understand the energy

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-31 Thread Richard Wordingham via Unicode
On Wed, 31 May 2017 19:24:04 + Shawn Steele via Unicode wrote: > It seems to me that being able to use a data stream of ambiguous > quality in another application with predictable results, then that > stream should be “repaired” prior to being handed over. Then both > endpoints would be usin

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-06-01 Thread Richard Wordingham via Unicode
On Thu, 1 Jun 2017 12:32:08 +0300 Henri Sivonen via Unicode wrote: > On Wed, May 31, 2017 at 8:11 PM, Richard Wordingham via Unicode > wrote: > > On Wed, 31 May 2017 15:12:12 +0300 > > Henri Sivonen via Unicode wrote: > >> I am not claiming it's too di

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Richard Wordingham via Unicode
On Thu, 01 Jun 2017 12:54:45 -0700 Doug Ewell via Unicode wrote: > Richard Wordingham wrote: > > > even supporting 6-byte patterns just in case 20.1 bits eventually > > turn out not to be enough, > > Oh, gosh, here we go with this. You were implicitly invited to argue that there was no need

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Richard Wordingham via Unicode
On Thu, 1 Jun 2017 17:10:54 -0700 Ken Whistler via Unicode wrote: > On 6/1/2017 2:39 PM, Richard Wordingham via Unicode wrote: > > You were implicitly invited to argue that there was no need to > > handle 5 and 6 byte invalid sequences. > > > > Well, working from

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Richard Wordingham via Unicode
On Thu, 1 Jun 2017 17:10:54 -0700 Ken Whistler via Unicode wrote: > Well, working from the *current* specification: > > FC 80 80 80 80 80 > and > FF FF FF FF FF FF > > are equal trash, uninterpretable as *anything* in UTF-8. > > By definition D39b, either sequence of bytes, if encountered by a

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Richard Wordingham via Unicode
On Thu, 1 Jun 2017 19:19:51 -0700 Ken Whistler via Unicode wrote: > > and therefore should start a > > sequence of 6 characters. > > That is completely false, and has nothing to do with the current > definition of UTF-8. > > The current, normative definition of UTF-8, in the Unicode Standa

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-05 Thread Richard Wordingham via Unicode
On Mon, 5 Jun 2017 13:08:06 +0900 "Martin J. Dürst via Unicode" wrote: > On 2017/06/02 04:54, Doug Ewell via Unicode wrote: > > Richard Wordingham wrote: > > > >> even supporting 6-byte patterns just in case 20.1 bits eventually > >> turn out not to be enough, > > Sorry to be late with this

Re: LATIN CAPITAL LETTER SHARP S officially recognized

2017-07-02 Thread Richard Wordingham via Unicode
On Sat, 01 Jul 2017 09:51:00 +0300 "a.lukyanov via Unicode" wrote: > Is it possible to design fonts that will render ẞ as SS? > > So we could choose between ẞ and SS by just selecting the proper > font, without changing the text itself. > > Or perhaps there will be a "font feature" to select th

Re: Unicode education in UK Schools

2017-07-08 Thread Richard Wordingham via Unicode
On Sat, 8 Jul 2017 09:04:39 -0700 Asmus Freytag via Unicode wrote: > But some handling > of combining mark (and also the new emoji sequences) would equally > constitute "basic" knowledge, with the Unicode algorithms like > sorting, Which major applications actually use the Unicode Collation Algo

Re: Turtle Graphics Emoji

2017-07-29 Thread Richard Wordingham via Unicode
On Fri, 28 Jul 2017 13:22:22 +0100 (BST) William_J_G Overington via Unicode wrote: > I have been thinking about having Turtle Graphics Emoji as an > educational and fun idea. I trust you are aware of the widespread feeling that there is already an excessive number of turtle characters in Unicode

Re: Version linking?

2017-08-17 Thread Richard Wordingham via Unicode
On Thu, 17 Aug 2017 18:34:56 +0530 Shriramana Sharma via Unicode wrote: > Thanks for your reply, but how can characters be used portably if they > are not part of the published standard yet? Or is it that hereafter > both Unicode Standard + Unicode Emoji Standard will be parallelly > portable or

Re: Unicode education in Schools

2017-08-24 Thread Richard Wordingham via Unicode
On Thu, 24 Aug 2017 17:17:10 + Andre Schappo via Unicode wrote: > So, I consider it important to familiarise students with SMP > characters as well as BMP characters. Then when they develop software > they will, at the start, be thinking beyond ASCII and Unicode BMP > characters. Just steer

Re: Unicode education in Schools

2017-08-26 Thread Richard Wordingham via Unicode
On Fri, 25 Aug 2017 12:57:37 +0100 (BST) William_J_G Overington via Unicode wrote: > UTF-16 is very useful. I use it in my research project. > If the byte content of a UTF-16 file is displayed in a hexadecimal > display then for all plane 0 characters the byte content of the > character codes ar

Re: Unicode education in Schools

2017-08-26 Thread Richard Wordingham via Unicode
On Fri, 25 Aug 2017 09:36:00 +0300 Eli Zaretskii via Unicode wrote: > > Date: Fri, 25 Aug 2017 00:23:40 +0100 > > From: Richard Wordingham via Unicode > > > > On Thu, 24 Aug 2017 17:17:10 + > > Andre Schappo via Unicode wrote: > > > > &g

Re: Unicode education in Schools

2017-08-26 Thread Richard Wordingham via Unicode
On Sat, 26 Aug 2017 18:55:25 +0300 Eli Zaretskii via Unicode wrote: > > Date: Sat, 26 Aug 2017 16:09:33 +0100 > > From: Richard Wordingham via Unicode > > It shouldn't. UTF-16 works just like UTF-8, except that the code > > units are bigger. > Not

Character Sequences of Uncertain Rendering (was: Version linking?)

2017-08-26 Thread Richard Wordingham via Unicode
On Fri, 25 Aug 2017 01:24:36 +0200 Philippe Verdy via Unicode wrote: > 2017-08-17 22:37 GMT+02:00 Richard Wordingham via Unicode < > unicode@unicode.org>: > > > Fortunately, there is no good evidence that the occurrence > > of multiple distinct left matras is

Re: Unicode education in Schools

2017-08-26 Thread Richard Wordingham via Unicode
On Sat, 26 Aug 2017 21:20:45 +0300 Eli Zaretskii via Unicode wrote: > > Date: Sat, 26 Aug 2017 18:52:03 +0100 > > From: Richard Wordingham via Unicode > We are miscommunicating. My point was that programming for MS-Windows > needs a good understanding of what the UTF-16 surr

Re: Character Sequences of Uncertain Rendering (was: Version linking?)

2017-08-26 Thread Richard Wordingham via Unicode
On Sat, 26 Aug 2017 21:52:19 +0200 Philippe Verdy via Unicode wrote: > 2017-08-26 21:28 GMT+02:00 Richard Wordingham via Unicode < > unicode@unicode.org>: > Of course SHY in this use is not suitable, but who knows if one will > not need this to split in tow parts what wo

Re: Unicode education in Schools

2017-08-26 Thread Richard Wordingham via Unicode
On Fri, 25 Aug 2017 09:36:44 -0400 John W Kennedy wrote: > Just a reminder that in Apple’s Swift a “Character” is anything that > looks like a character, including a letter with any theoretically > unlimited stack of diacritics, a flag, or a skin-toned emoji, and all > Swift functions working wit

Re: Character Sequences of Uncertain Rendering (was: Version linking?)

2017-08-27 Thread Richard Wordingham via Unicode
On Sun, 27 Aug 2017 19:55:31 +0200 Philippe Verdy via Unicode wrote: > 2017-08-27 6:06 GMT+02:00 Richard Wordingham via Unicode < > unicode@unicode.org>: > Canonical reordering is unambiguously refering to the canonical > equivalences in TUS. These are automated and can

Western numeral diacritics in complex scripts

2017-09-17 Thread Richard Wordingham via Unicode
In philological work, one encounters the problem that two or more abstract characters have only same 'natural' transliteration; the same problem can apply to reconstructed phonemes, where there is no sound indication of the actual pronunciation. A common solution is to use a subscript or superscrip

Normalise Tai Tham or not?

2017-10-10 Thread Richard Wordingham via Unicode
I'm preparing to share a spell-checker for Northern Thai in the Tai Tham script, and I'm having difficulty deciding whether to offer corrections in NFC/NFD or unnormalised. The problem arises in closed syllables with tone marks. For example, ᨠᩥ᩠᩵ᨶ /kin/ 'smell', has two canonically equivalent enc

Re: Normalise Tai Tham or not?

2017-10-10 Thread Richard Wordingham via Unicode
On Tue, 10 Oct 2017 22:46:20 +0300 Eli Zaretskii via Unicode wrote: > > Date: Tue, 10 Oct 2017 20:00:12 +0100 > > From: Richard Wordingham via Unicode > > > > 4) The pressure on search tools to respect canonical equivalence is > > now relatively low. Some editors

Re: Normalise Tai Tham or not?

2017-10-11 Thread Richard Wordingham via Unicode
On Wed, 11 Oct 2017 13:10:26 +0300 Eli Zaretskii via Unicode wrote: > > Date: Tue, 10 Oct 2017 21:51:55 +0100 > > From: Richard Wordingham via Unicode > > > > > Emacs lately introduced character-folding in searches, but it's > > > turned off by defa

Re: ASCII v Unicode

2017-11-12 Thread Richard Wordingham via Unicode
On Fri, 3 Nov 2017 02:36:43 -0700 Asmus Freytag via Unicode wrote: > On 11/3/2017 2:13 AM, Andre Schappo via Unicode wrote: > > You may > find https://twitter.com/andreschappo/status/926163719331176450 amusing > 😀 > > André Schappo > > You're wildly off in your page count. > > The "book" part

Minimal Implementation of Unicode Collation Algorithm

2017-12-04 Thread Richard Wordingham via Unicode
May a collation algorithm that always compares all strings as equal be a compliant implementation of the Unicode Collation Algorithm (UTS #10)? If not, by which clause is it not compliant? Formally, this algorithm would require that all weights be zero. Would an implementation that supported no c

Re: Minimal Implementation of Unicode Collation Algorithm

2017-12-04 Thread Richard Wordingham via Unicode
On Mon, 4 Dec 2017 12:48:11 -0800 Markus Scherer via Unicode wrote: > On Mon, Dec 4, 2017 at 5:30 AM, Richard Wordingham via Unicode < > unicode@unicode.org> wrote: > > Would an implementation that supported no characters be compliant? > I guess so. I assume that wo

Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-08 Thread Richard Wordingham via Unicode
Apart from the likely but unmandated consequence of making editing Indic text more difficult (possibly contrary to the UK's Equality Act 2010), there is another difficulty that will follow directly from the currently proposed expansion of grapheme clusters (https://www.unicode.org/reports/tr29/prop

Aquaφοβία

2017-12-09 Thread Richard Wordingham via Unicode
Draft 1 of UAX#29 'Unicode Text Segmentation' for Unicode 11.0.0 implies that it might be considered desirable to have a word boundary in 'aquaφοβία' or a grapheme cluster break in a coding such as <006C, U+0901 DEVANAGARI SIGN CANDRABINDU> for el candrabindu (l̐), which should be <006C, U+0310 COM

Re: Aquaφοβία

2017-12-09 Thread Richard Wordingham via Unicode
On Sat, 9 Dec 2017 16:08:22 +0100 Philippe Verdy wrote: > 2017-12-09 15:28 GMT+01:00 Richard Wordingham via Unicode < > unicode@unicode.org>: > > > Draft 1 of UAX#29 'Unicode Text Segmentation' for Unicode 11.0.0 > > implies that it might be considered des

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-09 Thread Richard Wordingham via Unicode
On Sat, 9 Dec 2017 16:16:44 +0100 Mark Davis ☕️ via Unicode wrote: > 1. You make a good point about the GB9c. It should probably instead be > something like: > > GB9c: (Virama | ZWJ ) × Extend* LinkingConsonant > > > Extend is a broader than necessary, and there are a few items that > have c

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-11 Thread Richard Wordingham via Unicode
On Sun, 10 Dec 2017 21:14:18 -0800 Manish Goregaokar via Unicode wrote: > > GB9c: (Virama | ZWJ ) × Extend* LinkingConsonant > > You can also explicitly request ligatureification with a ZWJ, so > perhaps this rule should be something like > > (Virama ZWJ? | ZWJ) x Extend* LinkingConsonant >

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-11 Thread Richard Wordingham via Unicode
On Mon, 11 Dec 2017 08:59:20 +0100 Mark Davis ☕️ via Unicode wrote: > The proposed rules do not distinguish the different visual forms that > a sequence of characters surrounding a virama can have, such as > >1. an explicit virama, or >2. a half-form is visible, or >3. a ligature is

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-11 Thread Richard Wordingham via Unicode
don't work (or need > more research before #29 is finalized in May), it is fairly > straightforward to restrict the rule changes by modifying > http://www.unicode.org/reports/tr29/proposed.html#Virama to either > exclude particular scripts or include only particular scripts. > > Ma

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-11 Thread Richard Wordingham via Unicode
On Mon, 11 Dec 2017 21:45:23 + Cibu Johny (സിബു) wrote: > I am assuming the purpose of the grapheme cluster definition is to be > used line spacing, vertical writing or cursor movement. Without > defining the purpose, it is hard for me to say if a ruleset is valid > or not. That is a very fa

Atomicity of Grapheme Clusters

2017-12-13 Thread Richard Wordingham via Unicode
I have been reviewing UAX#29 Unicode Text Segmentation because I have a feeling we will be trying to do too much with the concept of grapheme clusters, even with tailoring, when we extend it to include whole aksharas. What is the meaning of "Word boundaries, line boundaries, and sentence boundarie

Word_Break for Hieroglyphs

2017-12-14 Thread Richard Wordingham via Unicode
Is there any valid reason for Egyptian hieroglyphs to have Word_Break=ALetter rather than Complex_Context? So far as I am aware, hieroglyphs lack visible word breaks in both inscriptions and in modern transcriptions. Richard.

Re: Word_Break for Hieroglyphs

2017-12-14 Thread Richard Wordingham via Unicode
On Thu, 14 Dec 2017 18:11:33 + Andrew Glass via Unicode wrote: > We had some discussion on the sidelines of > the August UTC meeting at which time it became clear that more work > is needed as current property values are not entirely correct. > Currently, my Hieroglyphic energies are focused

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-14 Thread Richard Wordingham via Unicode
On Mon, 11 Dec 2017 21:45:23 + Cibu Johny (സിബു) wrote: > I am assuming the purpose of the grapheme cluster definition is to be > used line spacing, vertical writing or cursor movement. Without > defining the purpose, it is hard for me to say if a ruleset is valid > or not. Assuming that pur

Re: Word_Break for Hieroglyphs

2017-12-16 Thread Richard Wordingham via Unicode
On Thu, 14 Dec 2017 15:53:13 +0100 Mark Davis ☕️ via Unicode wrote: > On Thu, Dec 14, 2017 at 3:22 PM, Michael Everson > wrote: > > NO. Clusters cannot be broken up just anywhere. > Does that mean that ancient inscriptions would leave gaps at the end > of lines in order to not break a cluster

Re: Word_Break for Hieroglyphs

2017-12-20 Thread Richard Wordingham via Unicode
On Mon, 18 Dec 2017 15:15:11 +0100 Serge Rosmorduc via Unicode wrote: > Hence, you have things like (like 5-6) : : the word ẖsy « small », > is cut between the two lines. The phonetic part is line 5, and the > bird determinative is alone on line 5, above the preposition « m », > which is itself

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-21 Thread Richard Wordingham via Unicode
On Thu, 21 Dec 2017 17:55:33 +0900 "Martin J. Dürst via Unicode" wrote: > On 2017/12/15 07:40, Richard Wordingham via Unicode wrote: > > On Mon, 11 Dec 2017 21:45:23 + > > Cibu Johny (സിബു) wrote: > >> For example see the poster with word ഉസ്താദ് broke

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-22 Thread Richard Wordingham via Unicode
On Fri, 22 Dec 2017 09:27:15 +0200 Eli Zaretskii via Unicode wrote: > > Date: Thu, 21 Dec 2017 22:04:37 -0800 > > Cc: Unicode Public > > From: Manish Goregaokar via Unicode > > > > However, Firefox deletes by code point. > > As does Emacs, btw. And deleting in that fashion from the right i

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-22 Thread Richard Wordingham via Unicode
On Thu, 21 Dec 2017 22:04:37 -0800 Manish Goregaokar via Unicode wrote: > > When deleting by backspace, the usual practice is to delete one > > Unicode > character for each key press. > > This seems to depend on the operating system and program involved. For > example, on OSX any native text i

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-22 Thread Richard Wordingham via Unicode
On Fri, 22 Dec 2017 17:44:39 +0200 Eli Zaretskii via Unicode wrote: > You can always delete a codepoint at a given position in Emacs, > specifying the position by its number, but there are no user-level > commands to conveniently allow doing that in the middle of a grapheme > cluster. > > It was

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-31 Thread Richard Wordingham via Unicode
On Fri, 22 Dec 2017 17:44:39 +0200 Eli Zaretskii via Unicode wrote: > > Date: Fri, 22 Dec 2017 15:36:35 + > > From: Richard Wordingham via Unicode > > However, it seems > > that one has to modify the source code of Emacs to be able to edit > > in the middle of

Re: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10)

2018-01-01 Thread Richard Wordingham via Unicode
On Mon, 1 Jan 2018 13:24:29 +0530 Manish Goregaokar via Unicode wrote: > sounds very much like a > degenerate case to me. Generally yes, but I'm not sure that they'd be inappropriate for Egyptian hieroglyphs showing human beings. The choice of determinative can convey unpronounceable semantic

Re: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10)

2018-01-02 Thread Richard Wordingham via Unicode
On Tue, 2 Jan 2018 01:21:37 -0800 Asmus Freytag via Unicode wrote: > On 1/1/2018 6:52 AM, Richard Wordingham via Unicode wrote: > > Generally yes, but I'm not sure that they'd be inappropriate for > > Egyptian hieroglyphs showing human beings. The choice of >

Re: 0027, 02BC, 2019, or a new character?

2018-01-16 Thread Richard Wordingham via Unicode
On Mon, 15 Jan 2018 20:16:21 -0800 James Kass via Unicode wrote: > It will probably be the ASCII apostrophe. The stated intent favors > the apostrophe over diacritics or special characters to ensure that > the language can be input to computers with standard keyboards. Typing U+0027 into a word

Re: 0027, 02BC, 2019, or a new character?

2018-01-16 Thread Richard Wordingham via Unicode
On Mon, 15 Jan 2018 23:40:15 -0800 James Kass via Unicode wrote: > On a side note, wouldn't most of the "standard keyboards" currently in > Kazakhstan be labelled in Cyrillic anyway? They're probably already labelled in Cyrillic *and* printable ASCII (US QWERTY). Using the Cyrillic labels for no

Re: Emoji for major planets at least?

2018-01-19 Thread Richard Wordingham via Unicode
On Fri, 19 Jan 2018 02:12:04 +0100 Pierpaolo Bernardi via Unicode wrote: > On Fri, Jan 19, 2018 at 1:19 AM, Aleksey Tulinov via Unicode > wrote: > > Perhaps we all shall stop being ironical to each other, calm down, > > sit and discuss how to encode 3D animated emojies (animojies) in > > Unicode

Re: 0027, 02BC, 2019, or a new character?

2018-01-21 Thread Richard Wordingham via Unicode
On Sun, 21 Jan 2018 13:49:46 +0100 Philippe Verdy via Unicode wrote: > But there's NO standard keyboard in Kazakhstan with the Latin > alphabet. Those you'll find are cyrillic keyboards with a way to type > basic Latin. Or keyboards made for other countries. I believe we're talking about physica

  1   2   3   4   5   >