Re: unicode Digest V12 #108

2011-07-06 Thread Ken Whistler
On 7/6/2011 11:18 AM, Asmus Freytag wrote: The Danes, over a decade ago, when they made the official recommendation to use SHY appear to have come to the conclusion that "AA" can never occur accidentally, except at word division in compounds. Not really a safe conclusion. :) http://da.wikiped

Re: What are the issues in having U+FB06 fold to U+FB05?

2011-07-06 Thread Ken Whistler
On 7/6/2011 1:40 PM, Mark Davis ☕ wrote: The other two are special cases; they casefold together because of the way that the full case mapping is computed. Their equivalence is normally captured by a canonical-equivalent folding. Because the simple

Re: Proposed Update UAXes for Unicode 6.1

2011-07-08 Thread Ken Whistler
On 7/8/2011 10:26 AM, Philippe Verdy wrote: This is not related strictly related to this Unicode version update, but I have an interesting question about the Unicode Stability Policy. Summary: How does it apply to the exact value (or aliases) of the property "Decomposition Type" (dt), for compat

Re: Unicode 7.0 goals and ++

2011-07-11 Thread Ken Whistler
On 7/10/2011 4:58 PM, Ernest van den Boogaard wrote: For the long term, I suggest Unicode should aim for this: Unicode 6.5 should claim: There will be a *Unicode dictionary*, limiting and reducing ambiguous semantics within Unicode (Background: e.g. the word "character" will have one single cri

Re: Definition of character

2011-07-13 Thread Ken Whistler
On 7/13/2011 12:45 AM, Jukka K. Korpela wrote: For one thing, defining “Unicode character” as a technical term and using it consistently makes it possible to formulate clearly its relation to “character” in the common meaning, thereby helping people to understand and use Unicode better. Well,

Re: Definition of character

2011-07-13 Thread Ken Whistler
On 7/13/2011 1:23 PM, Jukka K. Korpela wrote: I don’t see that biologists use the word “life” in any confusing manner comparable to the Unicode confusion around “character.” “Life” isn’t really a central concept in biology, and its use in biology hardly differs much from everyday use. Defining

Re: Definition of character

2011-07-13 Thread Ken Whistler
Since Jukka seemed to take issue with my responding to his proffered definitions by instead bringing up an analogy between "life" and "character", I'll try responding directly to the attempted clarifications. On 7/13/2011 12:45 AM, Jukka K. Korpela wrote: That’s a completely different issue. Th

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webding proposal)

2011-07-15 Thread Ken Whistler
On 7/15/2011 11:36 AM, Michael Everson wrote: Look at Figures 8-1 through 8-4 in the Unicode Standard 5.0. We see graphic characters shown, one representing space and two representing joiners. This is plain text. Bt. Thanks for playing! But the correct answer i

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webding proposal)

2011-07-15 Thread Ken Whistler
On 7/15/2011 1:31 PM, Julian Bradfield wrote: On 2011-07-15, Leo Broukhis wrote: On Fri, Jul 15, 2011 at 12:04 PM, John W Kennedy wrote: Those of us old enough to recall IBM's old 6-bit BCDIC code (a retronym -- it was known as "BCD" in its own day) will remember the overstricken b/ characte

Productive Glyph Design vs. Productive Character Representation (was: Re: Quick survey of Apple symbol fonts ... )

2011-07-18 Thread Ken Whistler
[changing the thread title to disentangle this issue from the Apple symbol font discussion] On 7/16/2011 1:08 AM, Julian Bradfield wrote: The other two could be proposed as unitary symbols, if anybody really >needs to >represent them. They are commensurate with a large number of similar symbols

Re: Gaps in Brahmic scripts section of SMP

2011-08-08 Thread Ken Whistler
On 8/2/2011 3:26 PM, stas624-...@yahoo.com wrote: [Mainly aimed at people who can change roadmaps] [I used online feedback form, but got no responce, so reposting it here.] Your feedback was forwarded to the roadmap committee, which will consider it in the context of other requests and suggesti

Re: How is NBH (U0083) Implemented?

2011-08-08 Thread Ken Whistler
On 8/1/2011 7:26 AM, Naena Guru wrote: This thread wandered off into an argument about whether U+FEFF ZWNBSP or U+2060 WJ is best supported and which should be used to inhibit line breaks. However, there are still several other issues which bear addressing in Naena Guru's questions: The Unicod

Re: on proposed new Arab script characters for African lanugages (n3882)

2011-08-12 Thread Ken Whistler
On 8/12/2011 3:19 PM, Lorna Priest wrote: Our original proposal had these unified, but for various reasons we were asked to disunify them. Lorna Original Message Subject: on proposed new Arab script characters for African lanugages (n3882) From: mmarx To: unicode@unicode.o

Re: Proposed new characters updated in Pipeline Table

2011-08-15 Thread Ken Whistler
On 8/15/2011 10:38 AM, Philippe Verdy wrote: Unicode cannot encode a combining Wasla (because of various stability >> policies), so if Syriac needs a Wasla to be shown only over a letter >> or two, one needs to propose precomposed characters for them. Just >> like the existing Arabic Alef-Wasl

Re: Greek Characters Duplicated as Latin

2011-08-15 Thread Ken Whistler
On 8/15/2011 8:50 AM, Andreas Prilop wrote: > The Ohm sign should have been encoded as another example of "squared" > letters and abbreviations. It comes from Asian character sets, I’d say the ohm sign comes from the MacRoman character set (0xBD). http://www.unicode.org/Public/MAPPINGS/VENDO

Re: C1 Control Pictures Proposal

2011-08-17 Thread Ken Whistler
In general, I agree with Doug Ewell's assessment. I don't see a convincing case here for the need to encode more control picture characters for C1 controls. There seems to be a confusion here between the need for glyphs and the need for characters. Also, this would seem to me to be a receding hori

Re: Code pages and Unicode (wasn't really: RE: Endangered Alphabets)

2011-08-19 Thread Ken Whistler
On 8/19/2011 2:07 PM, Doug Ewell wrote: Technically, I think 10646 was always limited to 32,768 planes so that one could always address a code point with a 32-bit signed integer (a nod to the Java fans). Well, yes, but it didn't really have anything to do with Java. Remember that Java wasn't r

Re: Code pages and Unicode

2011-08-19 Thread Ken Whistler
On 8/19/2011 2:53 PM, Benjamin M Scarborough wrote: Whenever somebody talks about needing 31 bits for Unicode, I always think of the hypothetical situation of discovering some extraterrestrial civilization and trying to add all of their writing systems to Unicode. I imagine there would be litt

Re: RTL PUA?

2011-08-19 Thread Ken Whistler
On 8/19/2011 5:50 PM, Asmus Freytag wrote: If there was a group that got together and developed the necessary protocol, and then found that there's some provision in the Unicode standard that provides an undue limitation on some use of private use characters for which there's a demonstrated dem

Re: Code pages and Unicode

2011-08-22 Thread Ken Whistler
On 8/22/2011 9:58 AM, Jean-François Colson wrote: I wonder whether you aren’t a little too optimistic. No. If anything I'm assuming that the folks working on proposals will be amazingly assiduous during the next decade. Have you considered the unencoded ideographic scripts? Why, yes I have

ALM (was: Re: RTL PUA?)

2011-08-22 Thread Ken Whistler
On 8/21/2011 3:31 PM, Richard Wordingham wrote: I expect ARABIC LANGUAGE MARK would not go down well - has it already been proposed and rejected?. ARABIC *LETTER* MARK, not *LANGUAGE* mark. (And suggested to just be renamed to "AL MARK".) Proposed? Yes. Discussed? Yes. Rejected? No. The las

Re: Code pages and Unicode

2011-08-22 Thread Ken Whistler
On 8/22/2011 3:15 PM, Richard Wordingham wrote: On Monday 22 August 2011, Andrew West wrote: > > > Can anyone think of a way to extend UTF-16 without adding new > > surrogates or inventing a new general category? > > > > Andrew > > How about a triple sequence of two high surrogate

Re: Code pages and Unicode

2011-08-24 Thread Ken Whistler
On 8/24/2011 10:48 AM, Richard Wordingham wrote: Those are two different claims. 'Never say never' is a useful maxim. So is "Leave well enough alone." The problem would be in using maxims instead of an analysis of engineering requirements to drive architectural decisions. The extension of U

Re: Code pages and Unicode

2011-08-24 Thread Ken Whistler
On 8/24/2011 3:51 PM, Richard Wordingham wrote: Well, in that case, the correct action is to work to ensure that code > points are not squandered. Have there not already been several failures on that front? The BMP is littered with concessions to the limitations of rendering systems - precompo

Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0

2011-08-26 Thread Ken Whistler
On 8/26/2011 3:13 PM, Philippe Verdy wrote: Isn't there an intersection between NameAliases.txt proposed in PRI202, and the informational table defined for UTR #25 at http://www.unicode.org/Public/math/revision-12/MathClassEx-12.txt which also lists other name aliases for other standards ? No.

Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0

2011-08-26 Thread Ken Whistler
On 8/26/2011 5:01 PM, Philippe Verdy wrote: "we could as well include..." are dangerous words here. Going encyclopedic > is*completely* at odds with the normative intention of NameAliases.txt. Your statement then contradicts what PRI 202 says: "the intent is to add various standard and de fact

Re: Continue: Glaring Mistake in the Code List of South Asian Script, Reply to Daug Ewell and Others

2011-09-12 Thread Ken Whistler
On 9/12/2011 9:13 AM, Philippe Verdy wrote: Well, wasn't the ISCII standard naming the script "Bengali"? It also gave the name "Assamese", but was it a synonym or did it require a separate codepage switching code ? They were separate. Annex A of ISCII 1991 shows Bengali ("BNG") and Assamese (

Re: Noticed improvement in the Code chart link http://www.unicode.org/charts/

2011-09-28 Thread Ken Whistler
On 9/28/2011 12:12 PM, delex r wrote: Not possible. Character and block names cannot be changed once they are assigned. It's two decades too late to make that change. The most that can be done now is adding a few annotations for Assamese. —Ben Scarborough >...It's two decades too late to mak

Re: definition of plain text

2011-10-14 Thread Ken Whistler
On 10/13/2011 10:49 PM, Peter Cyrus wrote: Is there a definition or guideline for the distinction between plain text and rich text? I think where you may be getting hung up is trying to define plain text versus rich text in terms of the content and/or appearance of the text (i.e. the outcome)

Re: definition of plain text

2011-10-14 Thread Ken Whistler
On 10/14/2011 11:47 AM, Joó Ádám wrote: Peter asked for what the Unicode Consortium considers plain text, ie. what principles it apllies when deciding whether to encode a certain element or aspect of writing as a character. In turn, you thoroughly explained that plain text is what the Unicode Con

Re: definition of plain text

2011-10-17 Thread Ken Whistler
On 10/17/2011 1:23 AM, Peter Cyrus wrote: Perhaps the idea of something embedded in the text that then controls the display of the subsequent run of text is the very definition of "markup", whether or not that markup is a special character or an ASCII sequence like or. Yep. And FWIW, rather t

Re: Yiddish digraphs

2011-10-19 Thread Ken Whistler
On 10/19/2011 12:08 PM, Mark E. Shoulson wrote: I think the issue here is (probably) a matter of legacy encodings, though someone else would need to confirm that. O.k., as self-appointed historian of the standard, I guess I need to be the one to answer that. ;-) The Yiddish digraphs were added

Re: Default bidi ranges

2011-11-09 Thread Ken Whistler
On 11/9/2011 9:30 AM, Asmus Freytag wrote: On 11/9/2011 1:18 AM, "Martin J. Dürst" wrote: I tried to find something like a normative description of the default bidi class of unassigned code points. In UTR #9, it says (http://www.unicode.org/reports/tr9/tr9-23.html#Bidirectional_Character_Type

Economic Self-Interest (was: Re: combining: half, double, triple et cetera ad infinitum)

2011-11-14 Thread Ken Whistler
On 11/14/2011 2:39 PM, Naena Guru wrote: On the other hand, no company would send people to work at Unicode if they did not have an economic interest. One might as well rephrase that as: No company would send people to work at *any standard* if they did not have an economic interest. And yo

Re: missing characters: combining marks above runs of more than 2 base letters

2011-11-18 Thread Ken Whistler
On 11/17/2011 11:28 PM, Philippe Verdy wrote: Could the Unicode text specify that a left half mark, when it is followed by a right half-mark on the same line, has to be joined ? And which character can we select in a font to mark the intermediate characters between them ? No. This kind of stuf

Re: missing characters: combining marks above runs of more than 2 base letters

2011-11-18 Thread Ken Whistler
On 11/18/2011 11:21 AM, Peter Cyrus wrote: Ken, you mention "defined markup constructions", but nothing would prevent specialized rendering software from, for example, connecting a left half mark with the corresponding right half mark via titlo, even though the text is still only plain text wit

Re: more flexible pipeline for new scripts and characters

2011-11-18 Thread Ken Whistler
On 11/18/2011 1:30 PM, Karl Williamson wrote: How is this different from Named sequences, which are published provisionally? Named sequences aren't character properties. When a newly encoded character is published in the standard, its code point, its name, and dozens of other properties all ha

Re: missing characters: combining marks above runs of more than 2 base letters

2011-11-18 Thread Ken Whistler
On 11/18/2011 5:24 PM, Philippe Verdy wrote: This arc in the example is definitely NOT mathematics Nor did I say it was. (even if you have read a version where it was attempted to represent it using a Math TeX notation in this page, an obvious error because it used an angular \widehat and not

Re: missing characters: combining marks above runs of more than 2 base letters

2011-11-18 Thread Ken Whistler
On 11/18/2011 5:36 PM, Philippe Verdy wrote: I have absolutely no clear way to represent sequences like in this example that use such elongated diacritic applied to runs of more than two characters. Nor should you expect to be able to represent such things in plain text. Such conventions are n

Re: name change

2011-11-22 Thread Ken Whistler
On 11/22/2011 11:02 AM, a...@peoplestring.com wrote: In one of the discussions in this community, it was stated that once assigned, the name of a character cannot be changed. But I have noticed some characters have their name changed eg 'ARABIC LETTER YEH BARREE' (U+06D2) was previously named 'AR

Re: Question on UCA collation parameters (strength = tertiary, alternate = shifted)

2011-11-29 Thread Ken Whistler
On 11/29/2011 11:11 AM, Matt Ma wrote: Does Shifted implies strength being quaternary? If strength stays as tertiary (default or explicitly set), it seems the collation behavior is Blanked. Please clarify. No. Shifted is a particular strategy for handling the "variable collation elements" (sta

Re: Archaic Pashto letter

2011-12-09 Thread Ken Whistler
On 12/9/2011 9:06 AM, Andreas Prilop wrote: Arabic letter U+0682 shows two dots above. It has the cryptic remark "not used in modern Pashto". But was it ever used? To understand where the "cryptic" remark came from, you need to know more about the history of the character in the standard. U+06

Re: Upside Down Fu character

2012-01-09 Thread Ken Whistler
On 1/9/2012 12:23 PM, Asmus Freytag wrote: So, my question remains, are there any other avenues besides hot-metal printed text I assume that was an exaggeration for rhetorical effect -- since hot-metal printing technology went out half a century ago, replaced first by phototypesetting and then

Re: Tag characters

2015-05-27 Thread Ken Whistler
Doug, Read on in the minutes to the next day. 143-C27 and related actions. There are a few things to keep in mind here. 1. The un-deprecation of the tags U+E0020..U+E007E *is* part of the UCD for Unicode 8.0. The change has already taken place in the revised beta files now posted (see PropList.

Re: Arrow dingbats

2015-05-28 Thread Ken Whistler
Michel Suignard (editor of ISO/IEC 10646) responded to these questions, but let me augment his response with some more detailed history here. (Pardon the length of the reply, but these things tend never to be as simple as people assume and hope they are.) On 5/28/2015 2:08 PM, Chris wrote: So i

Re: Some questions about Unicode's CJK Unified Ideograph

2015-05-29 Thread Ken Whistler
On 5/29/2015 5:20 PM, gfb hjjhjh wrote: 1. I have seen a chinese character ⿰言亜 from a Vietnamese dictionary NHAT DUNG THUONG DAM DICTIONARY** So, a.) In http://www.unicode.org/alloc/Pipeline.html , it show that CJK Extension E and F have already been accepted, but where can I check tho

Re: Tag characters and in-line graphics (from Tag characters)

2015-06-02 Thread Ken Whistler
On 6/2/2015 2:01 AM, William_J_G Overington wrote: Local glyph memory, for use in compressing a document where the same glyph is used two or more times in the document: Um, that technology already exists. It is called a "font". A mechanism to be able to use the method to define a glyph li

Custom characters (was: Re: Private Use Area in Use)

2015-06-03 Thread Ken Whistler
On 6/3/2015 5:17 PM, John wrote: so what? There should be a standard way to put custom characters anywhere that characters belong and have things “just work”. Well, that's the rub, isn't it? We (in IT) are still working pretty dang hard on the simpler problem, to wit: There shoul

Unicode Terms of Use Clarification (was: Re: free download of ISO/IEC 10646)

2015-06-11 Thread Ken Whistler
mit* free use of the data and specifications in the development of products, but to discourage attempts to use the data in nonconformant or otherwise misleading implementations that would undermine the intended open interoperability of the Unicode Standard for all. Clear? --Ken Whistler, Technic

Re: trying to understand the relationship between the Version 1 Hangul syllables and the later versions'

2015-06-19 Thread Ken Whistler
Karl, As usual, the situation is way more complicated that perhaps it has any business being! It isn't just Version 1 Hangul that have to be considered, but also Version 1.1 Hangul. Version 1.0 contained 2350 Hangul syllables, encoded in the range 3400..3D2D. Version 1.1 contained 6646 H

Re: Why aren't the emoji modifiers GCB=Extend?

2015-06-19 Thread Ken Whistler
Karl, This results from the fact that the fallback behavior for the modifiers is simply as independent pictographic blorts, i.e. the color swatch images. That is also related to why they are treated as gc=Sk symbol modifiers, rather than as combining marks or format characters. If you *support*

Re: trying to understand the relationship between the Version 1 Hangul syllables and the later versions'

2015-06-24 Thread Ken Whistler
n standards from the early 1990's might know, however. --Ken On 6/24/2015 1:03 PM, Karl Williamson wrote: On 06/19/2015 04:12 PM, Ken Whistler wrote: The Unicode 2.0 set of 11,172 was known as the "Johab" set from KS C 5601-1992. That was an algorithmically designed replacement

Re: Adding RAINBOW FLAG to Unicode

2015-06-29 Thread Ken Whistler
Noah, Additional information you should have is that the UTC is about to publish a new Public Review Issue on the topic of an extended mechanism for the representation of more flag emoji with sequences of tag characters. (Note: *not* representation as encoded single character symbols.) That PRI,

Re: Adding RAINBOW FLAG to Unicode

2015-07-02 Thread Ken Whistler
On 7/2/2015 2:01 AM, Philippe Verdy wrote: The frozen status of Antarctica ... ... will be addressed separately by global warming. But be that as it may... In really there's still no standard way to encode flags unambiguously and in a stable way. We'd like to have FOTW (Flags of the World

Re: Adding RAINBOW FLAG to Unicode

2015-07-02 Thread Ken Whistler
On 7/2/2015 12:33 PM, Leo Broukhis wrote: If REGIONAL INDICATOR DASH and REGIONAL INDICATOR digits are added, along with regional supplementary symbols, then sequences * can be parsed unambiguously as ISO 3166-2, whereas + can be parsed as a named sequence signifying a flag of a non-governmenta

Re: Adding RAINBOW FLAG to Unicode

2015-07-03 Thread Ken Whistler
On 7/2/2015 5:56 PM, Peter Constable wrote: Erkki, in this case, I think Philippe is making valid points. -For the proposal to be workable requires some means of ensuring stability of encoded representations. The way this would be done would be for CLDR to provide data with all valid sequen

Re: PRI #299

2015-07-03 Thread Ken Whistler
On 7/3/2015 9:14 PM, Leo Broukhis wrote: On Fri, Jul 3, 2015 at 12:50 PM, Doug Ewell wrote: Leo Broukhis wrote: What I don't like about PRI #399 is its proposing to use default- ignorable characters. On a non-vexillology-aware platform, I'd like to see something informative, albeit not res

Re: Adding RAINBOW FLAG to Unicode

2015-07-06 Thread Ken Whistler
On 7/6/2015 8:26 AM, Doug Ewell wrote: Ken Whistler wrote: In that case I think a new registry mechanism might in fact make sense -- and I have spelled out details of how one could reasonably work in conjunction with the extended flag tag proposal in feedback submitted on PRI #299. Is

Re: BidiMirrored property and ancient scripts (Was Re: Plain text custom fraction input)

2015-07-23 Thread Ken Whistler
On 7/23/2015 3:00 AM, Frédéric Grosshans wrote: By the way, I think a comment should be added in the §4.7 of the standard to clarify that the BidiMirrored property is not intended for cases like hieroglyphs or italic. This eminently sensible suggestion has been passed along to the Unico

Re: BidiMirrored property and ancient scripts (Was Re: Plain text custom fraction input)

2015-07-24 Thread Ken Whistler
On 7/24/2015 2:59 AM, Frédéric Grosshans wrote: Is that better ? Once again, I agree that forbidding ancient Egyptian to be mirrored when “stupid and dangerous” I can see that this thread seems to have gone off the rails a bit. The Unicode Standard does not forbid Egyptian hieroglyphs from

Re: Update on flag tags (PRI #299)?

2015-08-13 Thread Ken Whistler
Doug, On 8/13/2015 7:58 AM, Doug Ewell wrote: The recently posted minutes from UTC #144 include the following: B.11.1.1.3 PRI 299 feedback and mailing list discussion [Edberg, L2/15-210] Discussion. UTC took no action at this time. and: [144-A93]

Re: Chess symbol glyphs in code charts

2015-08-14 Thread Ken Whistler
Garth, The glyphs for the chess symbols in the 26XX block date from Unicode 3.0. Most of the symbols redesigned for the Unicode 3.0 charts were done by John M. Fiscella. (See the font acknowledgements on p. iv of Unicode 3.0.) I do not know which predecessor designs Fiscella might ultimately have

Re: APL Under-bar Characters

2015-08-16 Thread Ken Whistler
It seems to me that APL has some very deeply embedded (and ancient) assumptions about fixed-width 8-bit characters, dating from ASCII days. It only got as far as it did with the current assumptions because people hacked up 8-bit fonts for all the special characters for the APL syntax, and because

Re: Standardised Variation Sequences with Toggles

2015-08-16 Thread Ken Whistler
On 8/16/2015 3:20 AM, Richard Wordingham wrote: The view of the Unicode Technical committee appears to be that the Unicode Character Database (UCD) takes priority over the core text of the Unicode Standard in case of conflict. (Please advise if I have misunderstood; I only have the core text a

Re: APL Under-bar Characters

2015-08-16 Thread Ken Whistler
Alex, On 8/16/2015 12:41 PM, alexwei...@alexweiner.com wrote: As far as I know, APL definitely predates the Unicode consortium. Do you think that The Consortium possibly overlooked the pre-existing under-bar character set? The answer to that is no. Initially, Unicode 1.0 attempted to pu

Re: APL Under-bar Characters

2015-08-18 Thread Ken Whistler
Returning to a historical note on the glyphic forms and the question of combining low lines or combining macrons below... admittedly a side note on this thread, the *original* identification of these APL uppercase Latin letters, at least in their IBM implementations, was clearly as uppercase (ital

Re: APL Under-bar Characters

2015-08-18 Thread Ken Whistler
On 8/18/2015 9:23 AM, Doug Ewell wrote: Tom Gewecke wrote: I guess the question is whether having a named sequence would somehow make it easier for the gnu apl folks to add something to their system so that their string length function sees such a sequence as having a length of "1"? I don't

Re: APL Under-bar Characters

2015-08-18 Thread Ken Whistler
On 8/18/2015 9:45 AM, Doug Ewell wrote: Ken Whistler wrote: Then we're back to the central point that Alex Weiner originally expressed, in arguing for the encoding of precomposed letters with underbar: The string length functionality would view an 'A' code point combined w

Re: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana?

2015-08-19 Thread Ken Whistler
I don't think that is the issue. U+FF9E/F are already lb=NS, which prevents line breaks before. The issue is instead loosening up the lb class for the halfwidth katakana syllables (from lb=AL to lb=ID), so that they *can* break the way the regular katakana syllables do. --Ken On 8/19/2015 9:21 A

Re: Concise term for non-ASCII Unicode characters

2015-09-29 Thread Ken Whistler
On 9/29/2015 10:30 AM, Sean Leonard wrote: On 9/29/2015 9:40 AM, Daniel Bünzli wrote: I would say there's already enough terminology in the Unicode world to add more to it. This thread already hinted at enough ways of expressing what you'd like, the simplest one being "scalar values greater

Scope of Unicode Character Properties (was: Re: Deleting Lone Surrogates)

2015-10-05 Thread Ken Whistler
Section 3.5, Properties, of the standard attempts to address this. "Code point properties" are properties of the code points, per se, and clearly do have all code points (U+..U+10) in their scope. An example is the Surrogate code point property, which wouldn't make much sense if it didn't

Why Nothing Ever Goes Away (was: Re: Acquiring DIS 10646)

2015-10-05 Thread Ken Whistler
On 10/5/2015 8:24 AM, Doug Ewell wrote: I too am puzzled as to what DIS 10646 and C1 control pictures have to do with each other. What an *excellent* cue to start a riff on arcane Unicode history! First, let me explain what I think Sean Leonard's concern here is. 1. On 10/4/2015 5:30 AM, Se

Re: Counting Codepoints

2015-10-11 Thread Ken Whistler
On 10/11/2015 2:20 PM, Richard Wordingham wrote: Is the number of codepoints in a UTF-16 string well defined? For example, which of the following two statements are true? (a) The ill-formed three code-unit Unicode 16-bit string <0xDC00, 0xD800, 0xDC20> contains two codepoints, U+DC00 and U+10

Re: Emoji data in UCD xml ?

2015-10-29 Thread Ken Whistler
There has been some preliminary discussion of this. The problem is that the data in emoji-data.txt has not yet been formally rationalized into a coherent set of Unicode character properties. The UTC would first need to determine exactly what property (or list of properties) is involved, before inc

Re: Unicode in the Curriculum?

2016-01-06 Thread Ken Whistler
Actually, ASCII should *not* be ignored or deprecated. We *love* ASCII. The issue is just making sure that students understand that the *true name* of "ASCII" is "UTF-8". It is just the very first 128 values that open into the entire world of Unicode characters. It is a mind trick to play on you

Re: Case for letters j and J with acute

2016-02-09 Thread Ken Whistler
On 2/9/2016 1:23 PM, David Faulks wrote: Perhaps Unicode could create a ‘default position’ property for combining characters, and encourage OpenType and other font engines to adopt it for automatic use when no other font information is provided. Adoption would take a while, but I cannot help

Re: Case for letters j and J with acute

2016-02-09 Thread Ken Whistler
Asmus, On 2/9/2016 2:19 PM, Asmus Freytag (t) wrote: On 2/9/2016 1:36 PM, Ken Whistler wrote: On 2/9/2016 1:23 PM, David Faulks wrote: Perhaps Unicode could create a ‘default position’ property for combining characters, and encourage OpenType and other font engines to adopt it for

Re: Additional decompositions in decomps.txt

2016-02-22 Thread Ken Whistler
Eli, You're not missing anything. This is a bug in the documentation of decomps.txt. Initially, added decompositions for the DUCET default weights were all tagged as . This results in a distinct *tertiary* weight in the initial collation weight values in DUCET. Later on, there turned up cases whe

Re: Additional decompositions in decomps.txt

2016-02-22 Thread Ken Whistler
Yes, that is correct. --Ken On 2/22/2016 11:10 AM, Eli Zaretskii wrote: OK, thanks. So conceptually, all those additional decompositions are all in the same class as those tagged "", in that they don't originate from the UCD, but were added for collation purposes, is that correct?

Just so story: Why isn't o-slash decomposed? (was: Re: Character folding in text editors)

2016-02-22 Thread Ken Whistler
mark's glyphs don't overlap those of the basic character. Is that correct? This sounds like a great question for Ken Whistler. ☺ Well, with a softball pitch like that one... ;-) The basics are described in TUS 8.0, Section 2.12, Equivalent Sequences, on p. 65, in "Non-decomposi

Re: Purpose of and rationale behind Go Markers U+2686 to U+2689

2016-03-09 Thread Ken Whistler
I don't know the answer to this. But I suspect that that the source was from one of the collection of fonts associated with the STIX project research that led to the collection of mathematical symbols additions noted in L2/01-067 (superseded by L2/01-142), as well as the earlier mathematical symbo

NamesList.txt as data source (was: Re: Gaps in Mathematical Alphanumeric Symbols)

2016-03-10 Thread Ken Whistler
On 3/10/2016 1:00 PM, Andrew West wrote: It (http://www.unicode.org/Public/UNIDATA/NamesList.txt) is machine-readable, although the file specifically warns that "this file should not be parsed for machine-readable information". NamesList.txt is just a structured text file, so of course it is

Re: NamesList.txt as data source

2016-03-11 Thread Ken Whistler
On 3/11/2016 9:37 AM, Oren Watson wrote: Ok, so let me see if I understand this correctly. Suppose I'm writing a editor for math equations, and I want the user to be able to press a "Doublestruck" button and then type an C or D to get a ℂ or 𝔻 respectively. There is apparently no official sou

Re: Proposal for *U+23FF SHOULDERED NARROW OPEN BOX?

2016-03-14 Thread Ken Whistler
U+23FF is already assigned to OBSERVER EYE SYMBOL, which is already under ballot for 10646 (and approved by the UTC). http://www.unicode.org/alloc/Pipeline.html Please always first check that page before suggesting code points for prospective new characters. --Ken On 3/12/2016 5:42 PM, Marcel

Re: annotations

2016-03-14 Thread Ken Whistler
On 3/13/2016 12:03 PM, Doug Ewell wrote: My point is that of J.S. Choi and Janusz Bień: the problem with declaring NamesList off-limits is that it does contain information that is either: • not available in any other UCD file, or • available, but only in comments (like the MAS mappings), whic

Re: Canonical block names: spaces vs. underscores

2016-05-26 Thread Ken Whistler
On 5/26/2016 1:17 AM, Mathias Bynens wrote: `Blocks.txt` (http://unicode.org/Public/UNIDATA/Blocks.txt) lists blocks such as `Cyrillic Supplement`. However, `PropertyValueAliases.txt` (http://unicode.org/Public/UNIDATA/PropertyValueAliases.txt) refers to this block as `Cyrillic_Supplement`,

Re: Canonical block names: spaces vs. underscores

2016-05-26 Thread Ken Whistler
e* certain other scripts that parse the UCD. On 26 May 2016, at 18:03, Ken Whistler wrote: […] "canonical block name" is not a defined term in the standard. I didn’t mean to imply it was — it’s just an English word. I meant “canonical” as in “without loose matching applied”. Ah,

Re: UAX44: loose matching of symbolic values and the `is` prefix

2016-06-06 Thread Ken Whistler
On 6/6/2016 12:58 AM, Mathias Bynens wrote: Backwards compatibility seems to be the only good reason to continue supporting the `is` prefix*for existing implementations*, such as the one in Perl. But why is it still a requirement for new engines to support it as part of UAX44-LM3? I’d like to

Re: Mende Kikakui Number 10

2016-06-10 Thread Ken Whistler
On 6/10/2016 2:59 PM, Doug Ewell wrote: How does one represent the values 100 and 1000 in Mende Kikakui? Is it not with ONE + HUNDREDS and ONE + THOUSANDS respectively? If so, then how is encoding 10 as ONE + TENS any different? Am I missing something? Nope, you got it right: 10 = <1E8C7, 1E8D

Re: Mende Kikakui Number 10

2016-06-10 Thread Ken Whistler
On 6/10/2016 3:23 PM, Michael Everson wrote: Mende Kikakui has no ZERO. This is a fault, and they would do well to devise one. An oval with a line through it like Ø would do. But they don’t have this. I concur with that. If the users of this system decide that they want to have a decimal rad

Re: Mende Kikakui Number 10

2016-06-10 Thread Ken Whistler
On 6/10/2016 5:34 PM, Andrew Cunningham wrote: There are two few descriptions of the system for me to be definitive but the number ten seems hold a unique position within the numeral system. As does the number 10 in every decimal numeral system. ;-) But that doesn't automatically requir

Re: Release date?

2016-06-21 Thread Ken Whistler
Doug, On 6/21/2016 7:43 AM, Doug Ewell wrote: "And tomorrow, June 21, we will have 71 new emojis to play with." Do only bloggers and the press get notified in advance of the release date of Unicode 9.0? They are getting it from the same place all of the members and anybody else could have be

Re: Announcing The Unicode(R) Standard, Version 9.0

2016-06-22 Thread Ken Whistler
On 6/22/2016 3:33 AM, Philippe Verdy wrote: 2016-06-22 0:02 GMT+02:00 >: Important symbol additions include: * 19 symbols for the new 4K TV standard We were told that this standad is not named "4K" but "UltraHD" (UHD)... "4K" is just a popular

Re: Adding half-star to Unicode?

2016-06-23 Thread Ken Whistler
On 6/23/2016 3:01 PM, Garth Wallace wrote: But precedent is for separate WITH LEFT HALF BLACK and WITH RIGHT HALF BLACK geometric shapes. Also, I'm not sure if the BLACK HALF STAR and STAR WITH LEFT HALF BLACK are entirely interchangeable. I agree. If we are going to do this, a set of 4 geo

Re: Comment in a leading German newspaper regarding the way UTC and Apple handle Emoji as an attack on Free Speech

2016-08-26 Thread Ken Whistler
On 8/26/2016 10:01 AM, John O'Conner wrote: What I find more interesting is how emoji (a small digital image or icon) was ever interpreted as encodable text for the Unicode Standard. If our German newspaper friends have made a mistake in interpreting emoji as speech, I think the Unicode consor

Re: Additional Emoji selection factor: Support by "Major Vendors"

2016-09-15 Thread Ken Whistler
On 9/11/2016 5:40 AM, Christoph Päper wrote: "Took no action" generally means "rejected". Can anyone explain then, why [L2/16-128] seems to have been “rejected” and still made it into selection.html? Not all documents in the UTC document register are born equal. If a document in the registe

Re: Why isn't MUSICAL SYMBOL NULL NOTEHEAD default ignorable?

2016-09-15 Thread Ken Whistler
On 9/5/2016 5:34 PM, Charlotte Buff wrote: It has just come to my attention that U+1D159 MUSICAL SYMBOL NULL NOTEHEAD is not default ignorable, even though it has no visible glyph appearance and no advance width in text, just like the various Hangul jamo fillers that *are* default ignorable. I

Re: Unicode Bidi Algorithm – Java reference implementation

2016-09-18 Thread Ken Whistler
On 9/17/2016 10:26 AM, Deepak Jois wrote: I now need to make the updates to support the changes in Unicode 8.0, and I am finding it a bit hard to grok the changes in C at a glance. The UBA 7.0 --> UBA 8.0 changes were rather subtle. They did not change much about the gross behavior of the al

Re: graphemes

2016-09-20 Thread Ken Whistler
On 9/20/2016 12:30 AM, Julian Bradfield wrote: are all legal spellings of the same word in a writing system, a useful linguistic definition of grapheme should ensure that all three variants have the same number of graphemes. Such a bizarre definition, which would also entail "color/colour", "

Re: My Annual Unicode Questions

2016-10-05 Thread Ken Whistler
On 10/5/2016 7:37 AM, Andre Schappo wrote: Q. Who understands Unicode? A. One student raised his hand. (This is an improvement on last year as no hand was raised last year) A brave soul, indeed! After 27 years of Unicode development, and with the standard (and its accumulated ancillary stan

  1   2   3   >