On 2/21/2020 7:53 AM, Costello, Roger L. via Unicode wrote:
Text files may indeed contain binary (i.e., bytes that are not
interpretable as characters). Namely, text files may contain newlines,
tabs, and some other invisible things.
Question: "characters" are defined as only the visible
Well, no, in this case "strange" means strange, as Ken Lunde notes. I'm
just pointing to his list, because it pulls together quite a few Han
characters that *also* have dubious cases for encoding.
Or you could turn the argument around, I suppose, and note that just
because the hieroglyph for
You want "dubious"?!
You should see the hundreds of strange characters already encoded in the
CJK *Unified* Ideographs blocks, as recently documented in great detail
by Ken Lunde:
https://www.unicode.org/L2/L2020/20059-unihan-kstrange-update.pdf
Compared to many of those, a hieroglyph of a
Richard,
What it comes down to is avoidance of conundrums involving canonical
reordering for normalization. The effect of variation selectors is
defined in terms of an immediate adjacency. If you allowed variation
selectors to be defined for combining marks of ccc!=0, then
normalization of
Richard,
Given that those particular two variation selectors have already given
very specific semantics for emoji sequences, and would now be expected
to occur *only* in emoji sequences:
https://www.unicode.org/reports/tr51/#def_text_presentation_selector
usurping them to do something
Shriramana,
That category is used to track character(s) in process that may have
been approved by WG2 but are not yet in ballot, or are in contention,
and may have just been dropped from ballot, but which still have
sufficient visibility to be tracked.
The process is a bit rough around the
Shriramana,
On 12/20/2019 6:29 PM, Shriramana Sharma via Unicode wrote:
I was looking at the pipeline for something else, and for the first
time I see a character category: “not accepted by the UTC but in ISO
ballot” and two characters in it.
Those two characters changed status as of December
On 12/20/2019 7:17 AM, wjgo_10...@btinternet.com via Unicode wrote:
It is indeed interesting that the Notice of Non-Approval itself uses
italics for emphasis in two places.
That text, at the present time, cannot be expressed in Unicode plain
text with the emphasis that the Notice of
On 10/30/2019 10:41 AM, wjgo_10...@btinternet.com via Unicode wrote:
At present I have a question to which I cannot find the answer.
Is the QID emoji format, if approved by the Unicode Technical
Committee going to be sent to the ISO/IEC 10646 committee for
consideration by that committee?
On 10/12/2019 3:15 AM, Fred Brennan via Unicode wrote:
There seems to be no conscionable reason for such a long delay after the
approval.
If that's just how things are done, fine, I certainly can't change the whole
system. But imagine if you had to wait two years to even have a chance of
Sorry about the typo there. I meant "the published Version 13.0 next March"
--Ken
On 10/11/2019 10:17 AM, Ken Whistler wrote:
then eventually in the published Version 13.0 next month:
Short answer is no.
The characters in the pipeline section labeled "Characters Accepted for
Version 13.0" are what will be in the beta review for 13.0 (look for
that sometime next month), and then eventually in the published Version
13.0 next month:
Fred,
2 hours and 33 minutes from now (today). But you don't need to try to
synch a proposal like this to a particular script ad hoc meeting. That
group meets roughly once a month, and any new proposal coming in right
now wouldn't be on the Unicode 13.0 train, even if the UTC immediately
On 9/26/2019 4:21 AM, Fred Brennan via Unicode wrote:
There is a clear demand for a SQUARE TB. In the font SMotoya Sinkai W55 W3,
which is ©2008 株式会社 モトヤ, the glyph is unencoded and accessed via the
Discretionary Ligatures (`dlig`) OpenType feature. It has name `T_B.dlig`.
Aye, there's the
On 8/14/2019 4:32 PM, James Kass via Unicode wrote:
If a character gets deprecated, can its decomposition type be changed
from canonical to compatibility?
Simple answer: No.
--Ken
Your helpful suggestions will be passed along to the people working on
the new site.
In the meantime, please note that the link to the "Unicode Technical
Site" has been added to the left column of quick links in the page
bottom banner, so it is easily available now from any page on the new
See the entry for "Magar Akkha" on:
http://linguistics.berkeley.edu/sei/scripts-not-encoded.html
Anshuman Pandey did preliminary research on this in 2011.
http://www.unicode.org/L2/L2011/11144-magar-akkha.pdf
It would be premature to assign an ISO 15924 script code, pending the
research to
On 7/18/2019 11:50 AM, Steffen Nurpmeso via Unicode wrote:
I also decided to enter /L2 directly from now on.
For folks wishing to access the UTC document register, Unicode
Consortium standards, and so forth, all of those links will be
permanently stable. They are not impacted by the
On 7/17/2019 4:54 PM, Philippe Verdy via Unicode wrote:
then the Unicode version (age) used for Hieroglyphs should also be
assigned to Hieratic.
It is already.
In fact the ligatures system for the "cursive" Egyptian Hieratic is so
complex (and may also have its own variants showing its
On 7/3/2019 10:47 AM, Sławomir Osipiuk via Unicode wrote:
Is my idea impossible, useless, or contradictory? Not at all.
What you are proposing is in the realm of higher-level protocols.
You could develop such a protocol, and then write processes that honored
it, or try to convince others
On 4/30/2019 12:45 AM, Julian Bradfield via Unicode wrote:
What is its appropriate Unicode representation?
A macron.
--Ken
On 3/13/2019 2:42 AM, Janusz S. Bień via Unicode wrote:
Hi!
On Mon, Jul 16 2018 at 7:07 +02, Janusz S. Bień via Unicode wrote:
FAQ (http://unicode.org/faq/vs.html) states:
For historic scripts, the variation sequence provides a useful tool,
because it can show mistaken or nonce
Egmont,
On 2/9/2019 11:48 AM, Egmont Koblinger via Unicode wrote:
Are there any (non-CJK) scripts for which crossword puzzles don't exist?
There are crossword puzzles for Hindi (in the Devanagari script). Just
do an image search for "Hindi crossword puzzle".
But the conventions for these
Richard,
On 2/1/2019 1:30 PM, Richard Wordingham via Unicode wrote:
Language tagging is already available in Unicode, via the tag characters
in the deprecated plane.
Recte:
1. Plane 14 is not a "deprecated plane".
2. The tag characters in Tag Character block (U+E..U+E007F) are not
On 1/31/2019 1:41 AM, Egmont Koblinger via Unicode wrote:
I mean, for
example we can introduce control characters that specify the language.
That is a complete non-starter for the Unicode Standard. And if the
terminal implementation introduces such as one-off hacks, they will fail
James,
On 1/8/2019 1:11 PM, James Kass via Unicode wrote:
But we're still using typewriter kludges to represent stress in Latin
script because there is no Unicode plain text solution.
O.k., that one needs a response.
We are still using kludges to represent stress in the Latin script
because
Michael,
On 11/21/2018 9:38 AM, Michael Everson via Unicode wrote:
What really annoys me about this is that there is no flag for Northern Ireland.
The folks at CLDR did not think to ask either the UK or the Irish
representatives to SC2 about this.
Neither CLDR-TC nor SC2 has any
On 11/21/2018 8:00 AM, William_J_G Overington via Unicode wrote:
Yet the interoperability does not derive from an International Standard.
The interoperability that enabled your mail to be delivered to me derives in
part from the MIME standard (RFC 2045 et seq.) which is not an International
On 11/20/2018 12:57 PM, William_J_G Overington via Unicode wrote:
quote
A Unicode Technical Standard (UTS) is an independent specification. Conformance
to the Unicode Standard does not imply conformance to any UTS.
end quote
My questions are as follows please.
Is that encoding for the
On 11/2/2018 10:02 AM, Philippe Verdy via Unicode wrote:
I was replying not about the notational repreentation of the DUCET
data table (using [....] unnecessarily) but about the text of
UTR#10 itself. Which remains highly confusive, and contains completely
unnecesary steps, and just
On 10/30/2018 2:32 PM, James Kass via Unicode wrote:
but we can't seem to agree on how to encode its abbreviation.
For what it's worth, "mgr" seems to be the usual abbreviation in Polish
for it.
--Ken
On 10/29/2018 8:06 PM, James Kass via Unicode wrote:
could be typed on old-style mechanical typewriters. Quintessential
plain-text, that.
Nope. Typewriters were regularly used for underscoring and for
strikethrough, both of which are *styling* of text, and not plain text.
The mere fact
Martin,
On 10/9/2018 12:47 AM, Martin J. Dürst via Unicode wrote:
- Using the 'capitalize' method to (try to) get the titlecase
property of a MTAVRULI character. (There's no other way
currently in Ruby to get the titlecase property.)
There may be others. If you have some ideas, I'd
On 10/2/2018 12:45 AM, Martin J. Dürst via Unicode wrote:
capitalize: uppercase (or title-case) the first character of the
string, lowercase the rest
When I say "cause problems", I mean producing mixed-case output. I
originally thought that 'capitalize' would be fine. It is fine for
On 8/31/2018 1:36 AM, Manuel Strehl via Unicode wrote:
For codepoints.net I use that data to stuff everything in a MySQL
database.
Well, for some sense of "everything", anyway. ;-)
People having this discussion should keep in mind a few significant points.
First, the UCD proper isn't
On 8/21/2018 7:56 AM, Adam Borowski via Unicode wrote:
On Mon, Aug 20, 2018 at 05:17:21PM -0700, Ken Whistler via Unicode wrote:
On 8/20/2018 5:04 PM, Mark E. Shoulson via Unicode wrote:
Is there a block of RTL PUA also?
No.
Perhaps there should be?
This is a periodic suggestion
On 8/20/2018 5:04 PM, Mark E. Shoulson via Unicode wrote:
Is there a block of RTL PUA also?
No.
--Ken
Steffen noted:
On 8/20/2018 3:22 PM, Steffen Nurpmeso via Unicode wrote:
It was just that i have read on one of the mailing-lists i am
subscribed to a cite of a Unicode statement that i have never read
of anything on the Unicode mailing-list. It is very awkward, but
i_again_ cannot find what
Steffen,
Are you looking for the Unicode list email archives?
https://www.unicode.org/mail-arch/
Those contain list content going back all the way to 1994.
--Ken
On 8/20/2018 6:08 AM, Steffen Nurpmeso via Unicode wrote:
I have the impression that many things which have been posted here
On 7/18/2018 6:43 AM, philip chastney via Unicode wrote:
there are also contexts where "Hello World!" can be read as
the function "Hello", applied to the factorial value of "World"
even though such a move wouldn't necessarily remove all ambiguity,
the easiest solution is to declare that
On 7/16/2018 3:51 PM, Shai Berger via Unicode wrote:
And I should add, in response to the other points raised in this
thread, from the same page in the core standard: "If the same plain text
sequence is given to disparate rendering processes, there is no
expectation that rendered text in each
On 5/29/2018 12:49 AM, Richard Wordingham via Unicode wrote:
How would one know that they are misapplied? And what if the author of
the text has broken your rules? Are such texts never to be transcribed
to pukka Unicode?
Applying Tamil -ii (0BC0, Script=Tamil) to the Latin letter a (0061,
On 5/28/2018 9:44 PM, Asmus Freytag via Unicode wrote:
One of the general principles is that combining marks inherit the
property of their base character.
Normally, "inherited" should be the only property value for combining
marks.
There have been some deviations from this over the
On 5/28/2018 9:23 PM, Martin J. Dürst via Unicode wrote:
Hello Sundar,
On 2018/05/28 04:27, SundaraRaman R via Unicode wrote:
Hi,
In languages like Ruby or Java
(https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isAlphabetic(int)),
functions to check if a character is
On 5/23/2018 8:53 AM, Abe Voelker via Unicode wrote:
As a user I find it troublesome because previous messages I've sent
using this character on these platforms may now be interpreted
differently due to the changed representation. That aspect has me
wondering if this change is in line with
On 5/15/2018 2:46 PM, Markus Scherer via Unicode wrote:
I am proposing the addition of 2 new characters to the Musical
Symbols table:
- the half-flat sign (lowers a note by a quarter tone)
- the half-sharp sign (raises a note by a quarter tone)
In an actual proposal, I
Henri,
There is no formal concept of a public "Editor's Draft" for the Unicode
core specification. This is mostly the result of the tools used for
editing the core specification, which is still structured more like a
book than the usual online internet specification.
Currently the Unicode
On 4/2/2018 7:02 PM, Philippe Verdy via Unicode wrote:
We're missing the definition of "ymojis", a safer alternatives of
"umojis" (unknown), but that "you" can create yourself for use by
yourself
Not to mention "əmojis", as in "Uh, Moe! Jeez, why are we still talking
about this?!"
--Ken
On 3/9/2018 9:29 AM, via Unicode wrote:
Documented increase such as scientific terms for new elements, flora
and fauna, would seem to be not more one or two dozen a year.
Indeed. Of the "urgently needed characters" added to the unified CJK
ideographs for Unicode 11.0, two were obscure
On 3/9/2018 6:58 AM, Marcel Schneider via Unicode wrote:
As of translating the Core spec as a whole, why did two recent attempts crash
even
before the maintenance stage, while the 3.1 project succeeded?
Essentially because both the Japanese and the Chinese attempts were
conceived of as
On 3/7/2018 1:12 PM, Philippe Verdy via Unicode wrote:
Shouldn't we create a variant of IDS, using combining joiners between
Han base glyphs (then possibly augmented by variant selectors if there
are significant differences on the simplification of rendered strokes
for each component) ? What
On 3/5/2018 9:03 AM, suzuki toshiya via Unicode wrote:
I have a question; if some people try to make a
translated version of Unicode
And to add to Asmus' response, folks on the list should understand that
even with the best of effort, the concept of a "translated version of
Unicode" is a
John,
I think this may be giving the list a somewhat misleading picture of the
actual statistics for encoding of CJK unified ideographs. The "500
characters a year" or "1000 characters a year" limits are administrative
limits set by the IRG for national bodies (and others) submitting
David,
On 2/22/2018 7:21 PM, David Corbett via Unicode wrote:
My confusion stems from Unicode’s online bidi utility.
That bidi utility has known defects in it. It is not yet conformant with
changes to UBA 6.3, let alone later changes to UBA. And the mapping of
memory position to display
On 2/16/2018 11:00 AM, Asmus Freytag via Unicode wrote:
On 2/16/2018 8:00 AM, Richard Wordingham via Unicode wrote:
That doesn't square well with, "An implementation *may* render a valid
Ideographic Description Sequence either by rendering the individual
characters separately or by parsing
On 2/16/2018 8:22 AM, Ken Whistler wrote:
The Egyptian quadrat controls, on the other hand, are full-fledged
Unicode format controls.
One more point of distinction: The (gc=So) IDC's follow a syntax that
uses Polish notation order for the descriptive operators (inherited from
the intended
On 2/16/2018 8:00 AM, Richard Wordingham via Unicode wrote:
A more portable solution for ideographs is to render an Ideographic
Description Sequences (IDS) as approximations to the characters they
describe. The Unicode Standard carefully does not prohibit so doing,
and a similar scheme is
On 2/15/2018 2:24 PM, Philippe Verdy via Unicode wrote:
And it's in the mission of Unicode, IMHO, to promote litteracy
Um, no. And not even literacy, either. ;-)
https://en.wikipedia.org/wiki/Category:Organizations_promoting_literacy
--Ken
On 2/14/2018 12:49 PM, Philippe Verdy via Unicode wrote:
RCLLTHTWHNLPHBTSWRFRSTNVNTDPPLWRTTXTLKTHS !
[ ... lots to say about the history of writing ... ]
And the use (or abuse) of emojis is returning us to the prehistory
when people draw animals on walls of caverns: this was a very
On 2/14/2018 12:53 AM, Erik Pedersen via Unicode wrote:
Unlike text composed of the world’s traditional alphabetic, syllabic, abugida
or CJK characters, emoji convey no utilitarian and unambiguous information
content.
I think this represents a misunderstanding of the function of emoji in
Gentlemen,
On 12/14/2017 6:53 AM, Mark Davis ☕️ via Unicode wrote:
Thus I would like people who are both knowledgeable about hieroglyphs
/and/ Unicode properties to weigh in. I know that people like Andrew
Glass are on this list, who satisfy both criteria.
And what constitutes a cluster?
Asmus,
On 12/5/2017 12:35 PM, Asmus Freytag via Unicode wrote:
I don't know the history of this particular "unification"
Here are some clues to guide further research on the history.
The annotation in question was added to a draft of the NamesList.txt
file for Unicode 4.1 on October 7,
On 9/27/2017 2:19 PM, Markus Scherer via Unicode wrote:
On Wed, Sep 27, 2017 at 1:49 PM, James Tauber via Unicode
> wrote:
I recently updated pyuca[1], my pure Python implementation of the
Unicode Collation Algorithm to work with
Ken,
On 9/27/2017 11:10 AM, Ken Shirriff via Unicode wrote:
The IBM type catalog might be of interest. It describes in great
detail the character sets of the IBM typewriters and line printers and
the custom characters that can be ordered for printer chains and
Selectric type balls. Link:
Asmus,
On 9/27/2017 10:02 AM, Asmus Freytag via Unicode wrote:
In that context it's worth remembering that there while you could say
for most typewriters that "the typewriter is the font", there were
noted exceptions. The IBM Selectric, for example, had exchangeable
type balls which allowed
Leo,
On 9/26/2017 9:00 PM, Leo Broukhis via Unicode wrote:
The next time I'm at the Mountain View CHM, I'll try to ask. However,
assuming it was an overstrike of an X and an I, then where does the
"Eris"-like glyph come from? Was there ever an IBM font with a
double-semicircular X like )( ?
Philippe,
Those aren't negative digits, per se. The usage in the manual is with an
overline (or macron) to indicate the flag bit. It does occur over a
zero, and in explanation in the text of floating point operations, it is
also shown over letters (X, M, E) representing digits of the exponent
Leo,
Yeah, I know. My point was that by examining the physical typewriter
keys (the striking head on the typebar, not the images on the keypads),
one could see what could be generated *by* overstriking. I think
Philippe's suggestion that it was simply an overstrike of "X" with an
"I" is
The 1620 manual accessed from the Wiki page shows the same information
but with a different glyph (which looks more like the capital zhe, and
is presumably the source of the glyph cited in the Wiki page itself). See:
Albrecht,
See TUS, Section 18.3, Bopomofo, p. 707:
http://www.unicode.org/versions/Unicode10.0.0/ch18.pdf#G22553
--Ken
On 8/24/2017 12:19 AM, Dreiheller, Albrecht via Unicode wrote:
Hello Chinese experts,
The Letter I in the Bopomofo alphabet (U+3127)has a two rendering
variants, a
Manuel,
I suspect that such a link may already be in the works for the
/Public/emoji/ data directory. But if you want to make sure your
suggestion is reviewed by the UTC, you should submit it via the contact
form:
http://www.unicode.org/reporting.html
--Ken
On 7/5/2017 12:37 PM, Manuel
On 7/5/2017 10:01 AM, Daniel Bünzli via Unicode wrote:
I know the emoji properties [1] are no formally part of the UCD (not sure
exactly why though),
Because they are maintained as part of an independent standard now (UTS
#51), which is still on track to have a faster turnaround -- and
I wonder IF 9 times suffice,
But IF more are required,
I'll tweet ILY, tweet it twice --
Since spelling's been retired.
On 6/21/2017 8:37 AM, William_J_G Overington via Unicode wrote:
Here is a mnemonic poem, that I wrote on Monday 20 February 2017, now published
as U+1F91F is now officially
On 6/1/2017 8:32 PM, Richard Wordingham via Unicode wrote:
TUS Section 3 is like the Augean Stables. It is a complete mess as a
standards document,
That is a matter of editorial taste, I suppose.
imputing mental states to computing processes.
That, however, is false. The rhetorical turn
On 6/1/2017 6:21 PM, Richard Wordingham via Unicode wrote:
By definition D39b, either sequence of bytes, if encountered by an
conformant UTF-8 conversion process, would be interpreted as a
sequence of 6 maximal subparts of an ill-formed subsequence.
("D39b" is a typo for "D93b".)
Sorry about
On 6/1/2017 2:39 PM, Richard Wordingham via Unicode wrote:
You were implicitly invited to argue that there was no need to handle
5 and 6 byte invalid sequences.
Well, working from the *current* specification:
FC 80 80 80 80 80
and
FF FF FF FF FF FF
are equal trash, uninterpretable as
On 5/26/2017 10:28 AM, Karl Williamson via Unicode wrote:
The link provided about the PRI doesn't lead to the comments.
PRI #121 (August, 2008) pre-dated the practice of keeping all the
feedback comments together with the PRI itself in a numbered directory
with the name "feedback.html".
Richard
On 5/23/2017 1:48 PM, Richard Wordingham via Unicode wrote:
The object is to generate code*now* that, up to say Unicode Version 23.0,
can work out, from the UCD files DerivedAge.txt and
PropertyValueAliases.txt, whether an arbitrary code point was included
by some Unicode version
On 5/3/2017 3:20 AM, William_J_G Overington via Unicode wrote:
Surely a single code point could be found. Single code points are being found
for various emoji items on a continuing basis. Why pull up the ladder on
encoding some flags each with a single code point?
Yes, a single code point
79 matches
Mail list logo