Re: Why do binary files contain text but text files don't contain binary?

2020-02-21 Thread Ken Whistler via Unicode
On 2/21/2020 7:53 AM, Costello, Roger L. via Unicode wrote: Text files may indeed contain binary (i.e., bytes that are not interpretable as characters). Namely, text files may contain newlines, tabs, and some other invisible things. Question: "characters" are defined as only the visible

Re: Egyptian Hieroglyph Man with a Laptop

2020-02-13 Thread Ken Whistler via Unicode
Well, no, in this case "strange" means strange, as Ken Lunde notes. I'm just pointing to his list, because it pulls together quite a few Han characters that *also* have dubious cases for encoding. Or you could turn the argument around, I suppose, and note that just because the hieroglyph for

Re: Egyptian Hieroglyph Man with a Laptop

2020-02-13 Thread Ken Whistler via Unicode
You want "dubious"?! You should see the hundreds of strange characters already encoded in the CJK *Unified* Ideographs blocks, as recently documented in great detail by Ken Lunde: https://www.unicode.org/L2/L2020/20059-unihan-kstrange-update.pdf Compared to many of those, a hieroglyph of a

Re: Combining Marks and Variation Selectors

2020-02-02 Thread Ken Whistler via Unicode
Richard, What it comes down to is avoidance of conundrums involving canonical reordering for normalization. The effect of variation selectors is defined in terms of an immediate adjacency. If you allowed variation selectors to be defined for combining marks of ccc!=0, then normalization of

Re: Adding Experimental Control Characters for Tai Tham

2020-01-29 Thread Ken Whistler via Unicode
Richard, Given that those particular two variation selectors have already given very specific semantics for emoji sequences, and would now be expected to occur *only* in emoji sequences: https://www.unicode.org/reports/tr51/#def_text_presentation_selector usurping them to do something

Re: Not accepted by UTC but in ISO ballot?

2019-12-27 Thread Ken Whistler via Unicode
Shriramana, That category is used to track character(s) in process that may have been approved by WG2 but are not yet in ballot, or are in contention, and may have just been dropped from ballot, but which still have sufficient visibility to be tracked. The process is a bit rough around the

Re: Not accepted by UTC but in ISO ballot?

2019-12-26 Thread Ken Whistler via Unicode
Shriramana, On 12/20/2019 6:29 PM, Shriramana Sharma via Unicode wrote: I was looking at the pipeline for something else, and for the first time I see a character category: “not accepted by the UTC but in ISO ballot” and two characters in it. Those two characters changed status as of December

Re: HEAVY EQUALS SIGN

2019-12-20 Thread Ken Whistler via Unicode
On 12/20/2019 7:17 AM, wjgo_10...@btinternet.com via Unicode wrote: It is indeed interesting that the Notice of Non-Approval itself uses italics for emphasis in two places. That text, at the present time, cannot be expressed in Unicode plain text with the emphasis that the Notice of

Re: New Public Review on QID emoji

2019-10-30 Thread Ken Whistler via Unicode
On 10/30/2019 10:41 AM, wjgo_10...@btinternet.com via Unicode wrote: At present I have a question to which I cannot find the answer. Is the QID emoji format, if approved by the Unicode Technical Committee going to be sent to the ISO/IEC 10646 committee for consideration by that committee?

Re: Will TAGALOG LETTER RA, currently in the pipeline, be in the next version of Unicode?

2019-10-12 Thread Ken Whistler via Unicode
On 10/12/2019 3:15 AM, Fred Brennan via Unicode wrote: There seems to be no conscionable reason for such a long delay after the approval. If that's just how things are done, fine, I certainly can't change the whole system. But imagine if you had to wait two years to even have a chance of

Re: Will TAGALOG LETTER RA, currently in the pipeline, be in the next version of Unicode?

2019-10-11 Thread Ken Whistler via Unicode
Sorry about the typo there. I meant "the published Version 13.0 next March" --Ken On 10/11/2019 10:17 AM, Ken Whistler wrote: then eventually in the published Version 13.0 next month:

Re: Will TAGALOG LETTER RA, currently in the pipeline, be in the next version of Unicode?

2019-10-11 Thread Ken Whistler via Unicode
Short answer is no. The characters in the pipeline section labeled "Characters Accepted for Version 13.0" are what will be in the beta review for 13.0 (look for that sometime next month), and then eventually in the published Version 13.0 next month:

Re: On the lack of a SQUARE TB glyph

2019-09-27 Thread Ken Whistler via Unicode
Fred, 2 hours and 33 minutes from now (today). But you don't need to try to synch a proposal like this to a particular script ad hoc meeting. That group meets roughly once a month, and any new proposal coming in right now wouldn't be on the Unicode 13.0 train, even if the UTC immediately

Re: On the lack of a SQUARE TB glyph

2019-09-26 Thread Ken Whistler via Unicode
On 9/26/2019 4:21 AM, Fred Brennan via Unicode wrote: There is a clear demand for a SQUARE TB. In the font SMotoya Sinkai W55 W3, which is ©2008 株式会社 モトヤ, the glyph is unencoded and accessed via the Discretionary Ligatures (`dlig`) OpenType feature. It has name `T_B.dlig`. Aye, there's the

Re: PUA (BMP) planned characters HTML tables

2019-08-14 Thread Ken Whistler via Unicode
On 8/14/2019 4:32 PM, James Kass via Unicode wrote: If a character gets deprecated, can its decomposition type be changed from canonical to compatibility? Simple answer: No. --Ken

Re: New website

2019-07-22 Thread Ken Whistler via Unicode
Your helpful suggestions will be passed along to the people working on the new site. In the meantime, please note that the link to the "Unicode Technical Site" has been added to the left column of quick links in the page bottom banner, so it is easily available now from any page on the new

Re: Akkha script (used by Eastern Magar language) in ISO 15924?

2019-07-22 Thread Ken Whistler via Unicode
See the entry for "Magar Akkha" on: http://linguistics.berkeley.edu/sei/scripts-not-encoded.html Anshuman Pandey did preliminary research on this in 2011. http://www.unicode.org/L2/L2011/11144-magar-akkha.pdf It would be premature to assign an ISO 15924 script code, pending the research to

Access to the Unicode technical site (was: Re: Unicode's got a new logo?)

2019-07-18 Thread Ken Whistler via Unicode
On 7/18/2019 11:50 AM, Steffen Nurpmeso via Unicode wrote: I also decided to enter /L2 directly from now on. For folks wishing to access the UTC document register, Unicode Consortium standards, and so forth, all of those links will be permanently stable. They are not impacted by the

Re: ISO 15924 : missing indication of support for Syriac variants

2019-07-18 Thread Ken Whistler via Unicode
On 7/17/2019 4:54 PM, Philippe Verdy via Unicode wrote: then the Unicode version (age) used for Hieroglyphs should also be assigned to Hieratic. It is already. In fact the ligatures system for the "cursive" Egyptian Hieratic is so complex (and may also have its own variants showing its

Re: Unicode "no-op" Character?

2019-07-03 Thread Ken Whistler via Unicode
On 7/3/2019 10:47 AM, Sławomir Osipiuk via Unicode wrote: Is my idea impossible, useless, or contradictory? Not at all. What you are proposing is in the realm of higher-level protocols. You could develop such a protocol, and then write processes that honored it, or try to convince others

Re: acute-macron hybrid?

2019-04-30 Thread Ken Whistler via Unicode
On 4/30/2019 12:45 AM, Julian Bradfield via Unicode wrote: What is its appropriate Unicode representation? A macron. --Ken

Re: Variation Sequences (and L2-11/059)

2019-03-13 Thread Ken Whistler via Unicode
On 3/13/2019 2:42 AM, Janusz S. Bień via Unicode wrote: Hi! On Mon, Jul 16 2018 at 7:07 +02, Janusz S. Bień via Unicode wrote: FAQ (http://unicode.org/faq/vs.html) states: For historic scripts, the variation sequence provides a useful tool, because it can show mistaken or nonce

Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Ken Whistler via Unicode
Egmont, On 2/9/2019 11:48 AM, Egmont Koblinger via Unicode wrote: Are there any (non-CJK) scripts for which crossword puzzles don't exist? There are crossword puzzles for Hindi (in the Devanagari script). Just do an image search for "Hindi crossword puzzle". But the conventions for these

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Ken Whistler via Unicode
Richard, On 2/1/2019 1:30 PM, Richard Wordingham via Unicode wrote: Language tagging is already available in Unicode, via the tag characters in the deprecated plane. Recte: 1. Plane 14 is not a "deprecated plane". 2. The tag characters in Tag Character block (U+E..U+E007F) are not

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Ken Whistler via Unicode
On 1/31/2019 1:41 AM, Egmont Koblinger via Unicode wrote: I mean, for example we can introduce control characters that specify the language. That is a complete non-starter for the Unicode Standard. And if the terminal implementation introduces such as one-off hacks, they will fail

Re: A last missing link for interoperable representation

2019-01-08 Thread Ken Whistler via Unicode
James, On 1/8/2019 1:11 PM, James Kass via Unicode wrote: But we're still using typewriter kludges to represent stress in Latin script because there is no Unicode plain text solution. O.k., that one needs a response. We are still using kludges to represent stress in the Latin script because

Re: The encoding of the Welsh flag

2018-11-21 Thread Ken Whistler via Unicode
Michael, On 11/21/2018 9:38 AM, Michael Everson via Unicode wrote: What really annoys me about this is that there is no flag for Northern Ireland. The folks at CLDR did not think to ask either the UK or the Irish representatives to SC2 about this. Neither CLDR-TC nor SC2 has any

Re: The encoding of the Welsh flag

2018-11-21 Thread Ken Whistler via Unicode
On 11/21/2018 8:00 AM, William_J_G Overington via Unicode wrote: Yet the interoperability does not derive from an International Standard. The interoperability that enabled your mail to be delivered to me derives in part from the MIME standard (RFC 2045 et seq.) which is not an International

Re: The encoding of the Welsh flag

2018-11-20 Thread Ken Whistler via Unicode
On 11/20/2018 12:57 PM, William_J_G Overington via Unicode wrote: quote A Unicode Technical Standard (UTS) is an independent specification. Conformance to the Unicode Standard does not imply conformance to any UTS. end quote My questions are as follows please. Is that encoding for the

Re: UCA unnecessary collation weight 0000

2018-11-02 Thread Ken Whistler via Unicode
On 11/2/2018 10:02 AM, Philippe Verdy via Unicode wrote: I was replying not about the notational repreentation of the DUCET data table (using [....] unnecessarily) but about the text of UTR#10 itself. Which remains highly confusive, and contains completely unnecesary steps, and just

Re: A sign/abbreviation for "magister"

2018-10-30 Thread Ken Whistler via Unicode
On 10/30/2018 2:32 PM, James Kass via Unicode wrote: but we can't seem to agree on how to encode its abbreviation. For what it's worth, "mgr" seems to be the usual abbreviation in Polish for it. --Ken

Re: A sign/abbreviation for "magister"

2018-10-29 Thread Ken Whistler via Unicode
On 10/29/2018 8:06 PM, James Kass via Unicode wrote: could be typed on old-style mechanical typewriters.  Quintessential plain-text, that. Nope. Typewriters were regularly used for underscoring and for strikethrough, both of which are *styling* of text, and not plain text. The mere fact

Re: Dealing with Georgian capitalization in programming languages

2018-10-09 Thread Ken Whistler via Unicode
Martin, On 10/9/2018 12:47 AM, Martin J. Dürst via Unicode wrote: - Using the 'capitalize' method to (try to) get the titlecase   property of a MTAVRULI character. (There's no other way   currently in Ruby to get the titlecase property.) There may be others. If you have some ideas, I'd

Re: Dealing with Georgian capitalization in programming languages

2018-10-02 Thread Ken Whistler via Unicode
On 10/2/2018 12:45 AM, Martin J. Dürst via Unicode wrote: capitalize: uppercase (or title-case) the first character of the string, lowercase the rest When I say "cause problems", I mean producing mixed-case output. I originally thought that 'capitalize' would be fine. It is fine for

Re: UCD in XML or in CSV?

2018-08-31 Thread Ken Whistler via Unicode
On 8/31/2018 1:36 AM, Manuel Strehl via Unicode wrote: For codepoints.net I use that data to stuff everything in a MySQL database. Well, for some sense of "everything", anyway. ;-) People having this discussion should keep in mind a few significant points. First, the UCD proper isn't

Re: Private Use areas

2018-08-21 Thread Ken Whistler via Unicode
On 8/21/2018 7:56 AM, Adam Borowski via Unicode wrote: On Mon, Aug 20, 2018 at 05:17:21PM -0700, Ken Whistler via Unicode wrote: On 8/20/2018 5:04 PM, Mark E. Shoulson via Unicode wrote: Is there a block of RTL PUA also? No. Perhaps there should be? This is a periodic suggestion

Re: Private Use areas

2018-08-20 Thread Ken Whistler via Unicode
On 8/20/2018 5:04 PM, Mark E. Shoulson via Unicode wrote: Is there a block of RTL PUA also? No. --Ken

Re: Tales from the Archives

2018-08-20 Thread Ken Whistler via Unicode
Steffen noted: On 8/20/2018 3:22 PM, Steffen Nurpmeso via Unicode wrote: It was just that i have read on one of the mailing-lists i am subscribed to a cite of a Unicode statement that i have never read of anything on the Unicode mailing-list. It is very awkward, but i_again_ cannot find what

Re: Tales from the Archives

2018-08-20 Thread Ken Whistler via Unicode
Steffen, Are you looking for the Unicode list email archives? https://www.unicode.org/mail-arch/ Those contain list content going back all the way to 1994. --Ken On 8/20/2018 6:08 AM, Steffen Nurpmeso via Unicode wrote: I have the impression that many things which have been posted here

Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-18 Thread Ken Whistler via Unicode
On 7/18/2018 6:43 AM, philip chastney via Unicode wrote: there are also contexts where "Hello World!" can be read as the function "Hello", applied to the factorial value of "World" even though such a move wouldn't necessarily remove all ambiguity, the easiest solution is to declare that

Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-16 Thread Ken Whistler via Unicode
On 7/16/2018 3:51 PM, Shai Berger via Unicode wrote: And I should add, in response to the other points raised in this thread, from the same page in the core standard: "If the same plain text sequence is given to disparate rendering processes, there is no expectation that rendered text in each

Re: Why is TAMIL SIGN VIRAMA (pulli) not Alphabetic?

2018-05-29 Thread Ken Whistler via Unicode
On 5/29/2018 12:49 AM, Richard Wordingham via Unicode wrote: How would one know that they are misapplied? And what if the author of the text has broken your rules? Are such texts never to be transcribed to pukka Unicode? Applying Tamil -ii (0BC0, Script=Tamil) to the Latin letter a (0061,

Re: Why is TAMIL SIGN VIRAMA (pulli) not Alphabetic?

2018-05-28 Thread Ken Whistler via Unicode
On 5/28/2018 9:44 PM, Asmus Freytag via Unicode wrote: One of the general principles is that combining marks inherit the property of their base character. Normally, "inherited" should be the only property value for combining marks. There have been some deviations from this over the

Re: Why is TAMIL SIGN VIRAMA (pulli) not Alphabetic?

2018-05-28 Thread Ken Whistler via Unicode
On 5/28/2018 9:23 PM, Martin J. Dürst via Unicode wrote: Hello Sundar, On 2018/05/28 04:27, SundaraRaman R via Unicode wrote: Hi, In languages like Ruby or Java (https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isAlphabetic(int)), functions to check if a character is

Re: Major vendors changing U+1F52B PISTOL  depiction from firearm to squirt gun

2018-05-23 Thread Ken Whistler via Unicode
On 5/23/2018 8:53 AM, Abe Voelker via Unicode wrote: As a user I find it troublesome because previous messages I've sent using this character on these platforms may now be interpreted differently due to the changed representation. That aspect has me wondering if this change is in line with

Re: preliminary proposal: New Unicode characters for Arabic music half-flat and half-sharp symbols

2018-05-15 Thread Ken Whistler via Unicode
On 5/15/2018 2:46 PM, Markus Scherer via Unicode wrote: I am proposing the addition of 2 new characters to the Musical Symbols table: - the half-flat sign (lowers a note by a quarter tone) - the half-sharp sign (raises a note by a quarter tone) In an actual proposal, I

Re: Is the Editor's Draft public?

2018-04-20 Thread Ken Whistler via Unicode
Henri, There is no formal concept of a public "Editor's Draft" for the Unicode core specification. This is mostly the result of the tools used for editing the core specification, which is still structured more like a book than the usual online internet specification. Currently the Unicode

Re: Fwd: RFC 8369 on Internationalizing IPv6 Using 128-Bit Unicode

2018-04-02 Thread Ken Whistler via Unicode
On 4/2/2018 7:02 PM, Philippe Verdy via Unicode wrote: We're missing the definition of "ymojis", a safer alternatives of "umojis" (unknown), but that "you" can create yourself for use by yourself Not to mention "əmojis", as in "Uh, Moe! Jeez, why are we still talking about this?!" --Ken

Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-09 Thread Ken Whistler via Unicode
On 3/9/2018 9:29 AM, via Unicode wrote: Documented increase such as scientific terms for new elements, flora and fauna, would seem to be not more one or two dozen a year. Indeed. Of the "urgently needed characters" added to the unified CJK ideographs for Unicode 11.0, two were obscure

Re: Translating the standard

2018-03-09 Thread Ken Whistler via Unicode
On 3/9/2018 6:58 AM, Marcel Schneider via Unicode wrote: As of translating the Core spec as a whole, why did two recent attempts crash even before the maintenance stage, while the 3.1 project succeeded? Essentially because both the Japanese and the Chinese attempts were conceived of as

Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-07 Thread Ken Whistler via Unicode
On 3/7/2018 1:12 PM, Philippe Verdy via Unicode wrote: Shouldn't we create a variant of IDS, using combining joiners between Han base glyphs (then possibly augmented by variant selectors if there are significant differences on the simplification of rendered strokes for each component) ? What

Translating the standard (was: Re: Fonts and font sizes used in the Unicode)

2018-03-05 Thread Ken Whistler via Unicode
On 3/5/2018 9:03 AM, suzuki toshiya via Unicode wrote: I have a question; if some people try to make a translated version of Unicode And to add to Asmus' response, folks on the list should understand that even with the best of effort, the concept of a "translated version of Unicode" is a

CJK Ideograph Encoding Velocity (was: Re: Unicode Emoji 11.0 characters now ready for adoption!)

2018-03-05 Thread Ken Whistler via Unicode
John, I think this may be giving the list a somewhat misleading picture of the actual statistics for encoding of CJK unified ideographs. The "500 characters a year" or "1000 characters a year" limits are administrative limits set by the IRG for national bodies (and others) submitting

Re: Bidi edge cases in Hangul and Indic

2018-02-22 Thread Ken Whistler via Unicode
David, On 2/22/2018 7:21 PM, David Corbett via Unicode wrote: My confusion stems from Unicode’s online bidi utility. That bidi utility has known defects in it. It is not yet conformant with changes to UBA 6.3, let alone later changes to UBA. And the mapping of memory position to display

Re: IDC's versus Egyptian format controls

2018-02-16 Thread Ken Whistler via Unicode
On 2/16/2018 11:00 AM, Asmus Freytag via Unicode wrote: On 2/16/2018 8:00 AM, Richard Wordingham via Unicode wrote: That doesn't square well with, "An implementation *may* render a valid Ideographic Description Sequence either by rendering the individual characters separately or by parsing

Re: IDC's versus Egyptian format controls

2018-02-16 Thread Ken Whistler via Unicode
On 2/16/2018 8:22 AM, Ken Whistler wrote: The Egyptian quadrat controls, on the other hand, are full-fledged Unicode format controls. One more point of distinction: The (gc=So) IDC's follow a syntax that uses Polish notation order for the descriptive operators (inherited from the intended

IDC's versus Egyptian format controls (was: Re: Why so much emoji nonsense?)

2018-02-16 Thread Ken Whistler via Unicode
On 2/16/2018 8:00 AM, Richard Wordingham via Unicode wrote: A more portable solution for ideographs is to render an Ideographic Description Sequences (IDS) as approximations to the characters they describe. The Unicode Standard carefully does not prohibit so doing, and a similar scheme is

Re: Why so much emoji nonsense?

2018-02-15 Thread Ken Whistler via Unicode
On 2/15/2018 2:24 PM, Philippe Verdy via Unicode wrote: And it's in the mission of Unicode, IMHO, to promote litteracy Um, no. And not even literacy, either. ;-) https://en.wikipedia.org/wiki/Category:Organizations_promoting_literacy --Ken

Re: Why so much emoji nonsense?

2018-02-14 Thread Ken Whistler via Unicode
On 2/14/2018 12:49 PM, Philippe Verdy via Unicode wrote: RCLLTHTWHNLPHBTSWRFRSTNVNTDPPLWRTTXTLKTHS ! [ ... lots to say about the history of writing ... ] And the use (or abuse) of emojis is returning us to the prehistory when people draw animals on walls of caverns: this was a very

Re: Why so much emoji nonsense?

2018-02-14 Thread Ken Whistler via Unicode
On 2/14/2018 12:53 AM, Erik Pedersen via Unicode wrote: Unlike text composed of the world’s traditional alphabetic, syllabic, abugida or CJK characters, emoji convey no utilitarian and unambiguous information content. I think this represents a misunderstanding of the function of emoji in

Re: Word_Break for Hieroglyphs

2017-12-14 Thread Ken Whistler via Unicode
Gentlemen, On 12/14/2017 6:53 AM, Mark Davis ☕️ via Unicode wrote: Thus I would like people who are both knowledgeable about hieroglyphs /and/ Unicode properties to weigh in. I know that people like Andrew Glass are on this list, who satisfy both criteria. ​ And what constitutes a cluster?

Re: Armenian Mijaket (Armenian colon)

2017-12-05 Thread Ken Whistler via Unicode
Asmus, On 12/5/2017 12:35 PM, Asmus Freytag via Unicode wrote: I don't know the history of this particular "unification" Here are some clues to guide further research on the history. The annotation in question was added to a draft of the NamesList.txt file for Unicode 4.1 on October 7,

Re: implicit weight base for U+2CEA2

2017-09-27 Thread Ken Whistler via Unicode
On 9/27/2017 2:19 PM, Markus Scherer via Unicode wrote: On Wed, Sep 27, 2017 at 1:49 PM, James Tauber via Unicode > wrote: I recently updated pyuca[1], my pure Python implementation of the Unicode Collation Algorithm to work with

Re: IBM 1620 invalid character symbol

2017-09-27 Thread Ken Whistler via Unicode
Ken, On 9/27/2017 11:10 AM, Ken Shirriff via Unicode wrote: The IBM type catalog might be of interest. It describes in great detail the character sets of the IBM typewriters and line printers and the custom characters that can be ordered for printer chains and Selectric type balls. Link:

Re: IBM 1620 invalid character symbol

2017-09-27 Thread Ken Whistler via Unicode
Asmus, On 9/27/2017 10:02 AM, Asmus Freytag via Unicode wrote: In that context it's worth remembering that there while you could say for most typewriters that "the typewriter is the font", there were noted exceptions. The IBM Selectric, for example, had exchangeable type balls which allowed

Re: IBM 1620 invalid character symbol

2017-09-27 Thread Ken Whistler via Unicode
Leo, On 9/26/2017 9:00 PM, Leo Broukhis via Unicode wrote: The next time I'm at the Mountain View CHM, I'll try to ask. However, assuming it was an overstrike of an X and an I, then where does the "Eris"-like glyph come from? Was there ever an IBM font with a double-semicircular X like )( ?

Re: IBM 1620 invalid character symbol

2017-09-26 Thread Ken Whistler via Unicode
Philippe, Those aren't negative digits, per se. The usage in the manual is with an overline (or macron) to indicate the flag bit. It does occur over a zero, and in explanation in the text of floating point operations, it is also shown over letters (X, M, E) representing digits of the exponent

Re: IBM 1620 invalid character symbol

2017-09-26 Thread Ken Whistler via Unicode
Leo, Yeah, I know. My point was that by examining the physical typewriter keys (the striking head on the typebar, not the images on the keypads), one could see what could be generated *by* overstriking. I think Philippe's suggestion that it was simply an overstrike of "X" with an "I" is

Re: IBM 1620 invalid character symbol

2017-09-25 Thread Ken Whistler via Unicode
The 1620 manual accessed from the Wiki page shows the same information but with a different glyph (which looks more like the capital zhe, and is presumably the source of the glyph cited in the Wiki page itself). See:

Re: Rendering variants of U+3127 Bopomofo Letter I

2017-08-24 Thread Ken Whistler via Unicode
Albrecht, See TUS, Section 18.3, Bopomofo, p. 707: http://www.unicode.org/versions/Unicode10.0.0/ch18.pdf#G22553 --Ken On 8/24/2017 12:19 AM, Dreiheller, Albrecht via Unicode wrote: Hello Chinese experts, The Letter I in the Bopomofo alphabet (U+3127)has a two rendering variants, a

Re: emoji props in the ucdxml ?

2017-07-05 Thread Ken Whistler via Unicode
Manuel, I suspect that such a link may already be in the works for the /Public/emoji/ data directory. But if you want to make sure your suggestion is reviewed by the UTC, you should submit it via the contact form: http://www.unicode.org/reporting.html --Ken On 7/5/2017 12:37 PM, Manuel

Re: emoji props in the ucdxml ?

2017-07-05 Thread Ken Whistler via Unicode
On 7/5/2017 10:01 AM, Daniel Bünzli via Unicode wrote: I know the emoji properties [1] are no formally part of the UCD (not sure exactly why though), Because they are maintained as part of an independent standard now (UTS #51), which is still on track to have a faster turnaround -- and

Re: Announcing The Unicode® Standard, Version 10.0

2017-06-21 Thread Ken Whistler via Unicode
I wonder IF 9 times suffice, But IF more are required, I'll tweet ILY, tweet it twice -- Since spelling's been retired. On 6/21/2017 8:37 AM, William_J_G Overington via Unicode wrote: Here is a mnemonic poem, that I wrote on Monday 20 February 2017, now published as U+1F91F is now officially

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Ken Whistler via Unicode
On 6/1/2017 8:32 PM, Richard Wordingham via Unicode wrote: TUS Section 3 is like the Augean Stables. It is a complete mess as a standards document, That is a matter of editorial taste, I suppose. imputing mental states to computing processes. That, however, is false. The rhetorical turn

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Ken Whistler via Unicode
On 6/1/2017 6:21 PM, Richard Wordingham via Unicode wrote: By definition D39b, either sequence of bytes, if encountered by an conformant UTF-8 conversion process, would be interpreted as a sequence of 6 maximal subparts of an ill-formed subsequence. ("D39b" is a typo for "D93b".) Sorry about

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Ken Whistler via Unicode
On 6/1/2017 2:39 PM, Richard Wordingham via Unicode wrote: You were implicitly invited to argue that there was no need to handle 5 and 6 byte invalid sequences. Well, working from the *current* specification: FC 80 80 80 80 80 and FF FF FF FF FF FF are equal trash, uninterpretable as

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-26 Thread Ken Whistler via Unicode
On 5/26/2017 10:28 AM, Karl Williamson via Unicode wrote: The link provided about the PRI doesn't lead to the comments. PRI #121 (August, 2008) pre-dated the practice of keeping all the feedback comments together with the PRI itself in a numbered directory with the name "feedback.html".

Re: Comparing Raw Values of the Age Property

2017-05-23 Thread Ken Whistler via Unicode
Richard On 5/23/2017 1:48 PM, Richard Wordingham via Unicode wrote: The object is to generate code*now* that, up to say Unicode Version 23.0, can work out, from the UCD files DerivedAge.txt and PropertyValueAliases.txt, whether an arbitrary code point was included by some Unicode version

Re: English flag (from Re: How to Add Beams to Notes)

2017-05-03 Thread Ken Whistler via Unicode
On 5/3/2017 3:20 AM, William_J_G Overington via Unicode wrote: Surely a single code point could be found. Single code points are being found for various emoji items on a continuing basis. Why pull up the ladder on encoding some flags each with a single code point? Yes, a single code point