On 7/6/2011 11:18 AM, Asmus Freytag wrote:
The Danes, over a decade ago, when they made the official
recommendation to use SHY appear to have come to the conclusion that
AA can never occur accidentally, except at word division in compounds.
Not really a safe conclusion. :)
On 7/6/2011 1:40 PM, Mark Davis ☕ wrote:
The other two are special cases; they casefold together
because of the
way that the full case mapping is computed. Their equivalence is
normally captured by a canonical-equivalent folding. Because
the simple
On 7/8/2011 10:26 AM, Philippe Verdy wrote:
This is not related strictly related to this Unicode version update,
but I have an interesting question about the Unicode Stability Policy.
Summary: How does it apply to the exact value (or aliases) of the
property Decomposition Type (dt), for
On 7/10/2011 4:58 PM, Ernest van den Boogaard wrote:
For the long term, I suggest Unicode should aim for this:
Unicode 6.5 should claim: There will be a *Unicode dictionary*,
limiting and reducing ambiguous semantics within Unicode
(Background: e.g. the word character will have one single
On 7/13/2011 12:45 AM, Jukka K. Korpela wrote:
For one thing, defining “Unicode character” as a technical term and
using it consistently makes it possible to formulate clearly its
relation to “character” in the common meaning, thereby helping people
to understand and use Unicode better.
On 7/13/2011 1:23 PM, Jukka K. Korpela wrote:
I don’t see that biologists use the word “life” in any confusing
manner comparable to the Unicode confusion around “character.” “Life”
isn’t really a central concept in biology, and its use in biology
hardly differs much from everyday use. Defining
Since Jukka seemed to take issue with my responding to his proffered
definitions
by instead bringing up an analogy between life and character, I'll
try responding
directly to the attempted clarifications.
On 7/13/2011 12:45 AM, Jukka K. Korpela wrote:
That’s a completely different issue. The
On 7/15/2011 11:36 AM, Michael Everson wrote:
Look at Figures 8-1 through 8-4 in the Unicode Standard 5.0.
We see graphic characters shown, one representing space and two representing joiners. This is plain text.
Bt. Thanks for playing! But the correct answer
[changing the thread title to disentangle this issue from the Apple
symbol font discussion]
On 7/16/2011 1:08 AM, Julian Bradfield wrote:
The other two could be proposed as unitary symbols, if anybody really
needs to
represent them. They are commensurate with a large number of similar symbols
On 8/2/2011 3:26 PM, stas624-...@yahoo.com wrote:
[Mainly aimed at people who can change roadmaps]
[I used online feedback form, but got no responce, so reposting it here.]
Your feedback was forwarded to the roadmap committee, which will consider
it in the context of other requests and
On 8/1/2011 7:26 AM, Naena Guru wrote:
This thread wandered off into an argument about whether U+FEFF ZWNBSP or
U+2060 WJ is best supported and which should be used to inhibit line breaks.
However, there are still several other issues which bear addressing in
Naena Guru's
questions:
The
On 8/12/2011 3:19 PM, Lorna Priest wrote:
Our original proposal had these unified, but for various reasons we
were asked to disunify them.
Lorna
Original Message
Subject: on proposed new Arab script characters for African lanugages
(n3882)
From: mmarx
On 8/15/2011 10:38 AM, Philippe Verdy wrote:
Unicode cannot encode a combining Wasla (because of various stability
policies), so if Syriac needs a Wasla to be shown only over a letter
or two, one needs to propose precomposed characters for them. Just
like the existing Arabic Alef-Wasla.
On 8/15/2011 8:50 AM, Andreas Prilop wrote:
The Ohm sign should have been encoded as another example of squared
letters and abbreviations. It comes from Asian character sets,
I’d say the ohm sign comes from the MacRoman character set (0xBD).
In general, I agree with Doug Ewell's assessment. I don't see a convincing
case here for the need to encode more control picture characters for C1
controls. There seems to be a confusion here between the need for
glyphs and the need for characters. Also, this would seem to me to
be a receding
On 8/19/2011 2:07 PM, Doug Ewell wrote:
Technically, I think 10646 was always limited to 32,768 planes so that
one could always address a code point with a 32-bit signed integer (a
nod to the Java fans).
Well, yes, but it didn't really have anything to do with Java. Remember
that Java
wasn't
On 8/19/2011 2:53 PM, Benjamin M Scarborough wrote:
Whenever somebody talks about needing 31 bits for Unicode, I always think of
the hypothetical situation of discovering some extraterrestrial civilization
and trying to add all of their writing systems to Unicode. I imagine there
would be
On 8/22/2011 9:58 AM, Jean-François Colson wrote:
I wonder whether you aren’t a little too optimistic.
No. If anything I'm assuming that the folks working on proposals will
be amazingly assiduous during the next decade.
Have you considered the unencoded ideographic scripts?
Why, yes I
On 8/21/2011 3:31 PM, Richard Wordingham wrote:
I expect ARABIC LANGUAGE MARK would not go down well
- has it already been proposed and rejected?.
ARABIC *LETTER* MARK, not *LANGUAGE* mark. (And suggested
to just be renamed to AL MARK.)
Proposed? Yes.
Discussed? Yes.
Rejected? No.
The last
On 8/22/2011 3:15 PM, Richard Wordingham wrote:
On Monday 22 August 2011, Andrew Westandrewcw...@gmail.com wrote:
Can anyone think of a way to extend UTF-16 without adding new
surrogates or inventing a new general category?
Andrew
How about a triple sequence of two
On 8/24/2011 10:48 AM, Richard Wordingham wrote:
Those are two different claims. 'Never say never' is a useful maxim.
So is Leave well enough alone.
The problem would be in using maxims instead
of an analysis of engineering requirements to drive architectural decisions.
The extension of
On 8/24/2011 3:51 PM, Richard Wordingham wrote:
Well, in that case, the correct action is to work to ensure that code
points are not squandered.
Have there not already been several failures on that front? The BMP is
littered with concessions to the limitations of rendering systems -
On 8/26/2011 3:13 PM, Philippe Verdy wrote:
Isn't there an intersection between NameAliases.txt proposed in
PRI202, and the informational table defined for UTR #25 at
http://www.unicode.org/Public/math/revision-12/MathClassEx-12.txt
which also lists other name aliases for other standards ?
No.
On 8/26/2011 5:01 PM, Philippe Verdy wrote:
we could as well include... are dangerous words here. Going encyclopedic
is*completely* at odds with the normative intention of NameAliases.txt.
Your statement then contradicts what PRI 202 says:
the intent is to add various standard and de facto
On 9/12/2011 9:13 AM, Philippe Verdy wrote:
Well, wasn't the ISCII standard naming the script Bengali? It also
gave the name Assamese, but was it a synonym or did it require a
separate codepage switching code ?
They were separate. Annex A of ISCII 1991 shows Bengali (BNG) and
Assamese (ASM)
On 9/28/2011 12:12 PM, delex r wrote:
Not possible. Character and block names cannot be changed once they are
assigned. It's two decades too late to make that change. The most that can be
done now is adding a few annotations for Assamese.
—Ben Scarborough
...It's two decades too late to
On 10/13/2011 10:49 PM, Peter Cyrus wrote:
Is there a definition or guideline for the distinction between plain
text and rich text?
I think where you may be getting hung up is trying to define plain
text versus rich text in terms of the content and/or appearance of
the text (i.e. the
On 10/14/2011 11:47 AM, Joó Ádám wrote:
Peter asked for what the Unicode Consortium considers plain text, ie.
what principles it apllies when deciding whether to encode a certain
element or aspect of writing as a character. In turn, you thoroughly
explained that plain text is what the Unicode
On 10/17/2011 1:23 AM, Peter Cyrus wrote:
Perhaps the idea of something embedded in the text that then controls
the display of the subsequent run of text is the very definition of
markup, whether or not that markup is a special character or an
ASCII sequence like/spanspan style=gait:xxx;
On 10/19/2011 12:08 PM, Mark E. Shoulson wrote:
I think the issue here is (probably) a matter of legacy encodings,
though someone else would need to confirm that.
O.k., as self-appointed historian of the standard, I guess I need to be
the one to answer that. ;-)
The Yiddish digraphs were
On 11/9/2011 9:30 AM, Asmus Freytag wrote:
On 11/9/2011 1:18 AM, Martin J. Dürst wrote:
I tried to find something like a normative description of the default
bidi class of unassigned code points.
In UTR #9, it says
On 11/14/2011 2:39 PM, Naena Guru wrote:
On the other hand, no company would send people to work at Unicode if
they did not have an economic interest.
One might as well rephrase that as:
No company would send people to work at *any standard* if they did not
have an economic interest.
And
On 11/17/2011 11:28 PM, Philippe Verdy wrote:
Could the Unicode text specify that a left half mark, when it is
followed by a right half-mark on the same line, has to be joined ? And
which character can we select in a font to mark the intermediate
characters between them ?
No.
This kind of
On 11/18/2011 11:21 AM, Peter Cyrus wrote:
Ken, you mention defined markup constructions, but nothing would
prevent specialized rendering software from, for example, connecting a
left half mark with the corresponding right half mark via titlo, even
though the text is still only plain text with
On 11/18/2011 1:30 PM, Karl Williamson wrote:
How is this different from Named sequences, which are published
provisionally?
Named sequences aren't character properties.
When a newly encoded character is published in the standard, its code point,
its name, and dozens of other properties all
On 11/18/2011 5:24 PM, Philippe Verdy wrote:
This arc in the example is definitely NOT mathematics
Nor did I say it was.
(even if you
have read a version where it was attempted to represent it using a
Math TeX notation in this page, an obvious error because it used an
angular \widehat and
On 11/18/2011 5:36 PM, Philippe Verdy wrote:
I have absolutely no clear way to represent sequences like in this
example that use such elongated diacritic applied to runs of more than
two characters.
Nor should you expect to be able to represent such things in plain text.
Such
conventions are
On 11/22/2011 11:02 AM, a...@peoplestring.com wrote:
In one of the discussions in this community, it was stated that once
assigned, the name of a character cannot be changed. But I have noticed
some characters have their name changed eg 'ARABIC LETTER YEH BARREE'
(U+06D2) was previously named
On 12/9/2011 9:06 AM, Andreas Prilop wrote:
Arabic letter U+0682 shows two dots above.
It has the cryptic remark not used in modern Pashto.
But was it ever used?
To understand where the cryptic remark came from, you need to know
more about the history of the character in the standard.
U+0682
On 1/9/2012 12:23 PM, Asmus Freytag wrote:
So, my question remains, are there any other avenues besides
hot-metal printed text
I assume that was an exaggeration for rhetorical effect -- since hot-metal
printing technology went out half a century ago, replaced first by
phototypesetting and then
On 1/17/2012 4:43 AM, satai wrote:
I would like to address two textual issues in this proposal.
These are not actually textual issues in the *proposal*, but rather
issues regarding
the annotation of the code charts for these additions.
1) U+10C8—U+10CC and U+2D28—U+2D2C are marked as
On 1/27/2012 1:16 PM, Matt Ma wrote:
Hi,
There are a few characters having no decomposition type defined in
UnicodeData.txt, but they were assigned tertiary weight in
allkeys.text as if the characters had decomposition type. Here are a
few examples (version 6.0.0),
...
U+A733, U+A732,
On 2/23/2012 2:44 PM, António Martins-Tuválkin wrote:
It is defined as
33D7;SQUARE PH;So;0;L;square 0050 0048N;SQUARED PH
in UnicodeData.txt, but it is shown as pH in code chart. Should it be
0070 0048 or PH?
It should certainly be pH, i.e., square0070 0048/square,
because that's
On 3/5/2012 11:44 AM, Philippe Verdy wrote:
So what do you propose ?
It doesn't matter what *Michael* proposes at this point. These have already
been approved by both the UTC and WG2 and are currently in DAM ballot.
- Encoding the new precomposed pairs as a new combining character
(there may
On 3/5/2012 11:56 AM, Philippe Verdy wrote:
Note that the first alternative is the one used in the DAM for
encoding a separate COMBINING LATIN SMALL LETTER A/O/U WITH DIAERESIS
Correct.
But the document cited by Denis gives a much more productive way that
allows stacking any kind of letters
On 3/5/2012 12:17 PM, Benjamin M Scarborough wrote:
On Mon, Mar 5, 2012 at 19:09, Michael Everson wrote:
No, because both the combining-a and the combining-diaeresis are bound to the
base letter; the combining diaeresis is not bound to the combining-a.
Just like the proposed U+1ABB COMBINING
On 3/5/2012 12:51 PM, Philippe Verdy wrote:
You are so much attached to keep the existing encoding model
unchanged,
Yep. That's why I work on *standards*, after all.
that now you are going to prepare for LOTS of additions of
combining Latin characters with diacritics... The BMP won't be
On 3/5/2012 2:01 PM, Denis Jacquerye wrote:
Wouldn't CGJ be useful in some way in cases like that of the cedilla
or the light centralization stroke 1AB9 ?
Base character + combining letter + CGJ + combining cedilla would be
clear, the cedilla would not be moved.
How is that simpler than Base
On 3/5/2012 2:32 PM, Denis Jacquerye wrote:
I guess it's less messy than other situations. I just couldn't help
wondering why combining letters with diacritics are being encoded but
letters with diacritics or out of the question.
Because the combining ones are *not* decomposed, and hence don't
On 3/6/2012 2:34 PM, Leo Broukhis wrote:
On 3/6/12, Doug Ewelld...@ewellic.org wrote:
Speaking of U+17D2 KHMER SIGN COENG, what is a conforming renderer to
do if someone writes A្B ? (U+0041 U+17D2 U+0042)
Roll its eyes?
I guess :), but how should it look on the screen?
Just the way your
On 3/6/2012 3:19 PM, Leo Broukhis wrote:
On 3/6/12, Ken Whistlerk...@sybase.com wrote:
On 3/6/2012 2:34 PM, Leo Broukhis wrote:
On 3/6/12, Doug Ewelld...@ewellic.org wrote:
Speaking of U+17D2 KHMER SIGN COENG, what is a conforming renderer to
do if someone writes A្B ? (U+0041 U+17D2
On 3/6/2012 4:25 PM, Leo Broukhis wrote:
What about Grapheme_Extend class characters placed out of context? It
would be nice to see a dotted box in cases like AׁB
(U+0041 U+05C1 HEBREW POINT SHIN DOT U+0042)
That is pretty much up to the rendering system or font designer.
--Ken
On 3/6/2012 8:27 PM, fantasai wrote:
Unicode has a Pc category into which it assigns various low lines:
_U+005F LOW LINE
‿U+203F UNDERTIE
⁀U+2040 CHARACTER TIE
⁔U+2054 INVERTED UNDERTIE
Those 4 are the actual connectors. The concept arose because of the
On 4/3/2012 9:51 AM, Shawn Steele wrote:
My assumption is the page uses JS to get the dates? Since my user
locale happened to be set to Klingon, that’s what it displayed.
Exactly. There is a call to:
Date(document.lastModified).toLocaleString() in the Javascript.
So for those who assumed
On 4/3/2012 6:57 PM, Karl Williamson wrote:
Is it an error on the web site that this policy was in effect in 2.0,
and it really should be 3.0? (as there no such decompositions in the
data files starting in 3.0).
Yes.
Or were these data files defective?
No.
The research to determine how
On 4/15/2012 10:04 PM, Asmus Freytag wrote:
The 1E00 and 1F00 blocks were populated, in Unicode 1.1 by rejects
from Unicode 1.0 that were re-admitted as part of the merger with
ISO/IEC 10646. If you have anyone with access to the early (paper
only) meeting documents of WG2, you might, just
On 4/25/2012 6:55 AM, Juanma Barranquero wrote:
Ada 2012 is adding (quoting from the ARM):
A.4.11 String Encoding
[...]
{AI05-0137-2} {AI05-0262-1} The type Encoding_Scheme defines encoding
schemes. UTF_8 corresponds to the UTF-8 encoding scheme defined by
Annex D of ISO/IEC 10646. UTF_16BE
On 4/27/2012 10:45 AM, Richard Wordingham wrote:
If they are to be adopted by the CLDR, the digits need to be coded
consecutively.
I doubt this matters in any case, because this proposed use is for
a vigesimal system, which has digits 0..19, not digits 0..9. Trying to
treat the first 10 digits
On 4/30/2012 3:33 PM, Richard Wordingham wrote:
One is not compelled to construct U+3039 (〹) ,twenty' from two U+3038
(〸) ,ten', so a CUNEIFORM TWO U may well be missing.
It looks as though it is.
No, it isn't.
It was present in Proposal N2664
On 5/1/2012 11:19 AM, Michael Everson wrote:
It does not matter if sideways text can be read as words, or just as gibberish.
Good practice and typographic design will not rotate syllabic text because of
the inherent confusability.
Michael has a generally valid point. Rotating *small*
On 5/16/2012 2:54 PM, Richard Wordingham wrote:
Similar remarks apply to 'reorder'. What if I move 'Q' and 'q' into
the Cyrillic sequence? (I've a recollection that this letter is used
in Kurdish written in Cyrillic.)
Obsolete recollection. See:
051A;CYRILLIC CAPITAL LETTER
On 5/21/2012 4:37 PM, Richard Wordingham wrote:
Again, even the interpretation of uppercase in terms of weights is not
certain, for the ISO/IEC 14651:2007 example of a tailoring for
uppercase first does not adjust the collation elements with a tertiary
weight of 1C, although they are listed as
On 5/23/2012 7:05 AM, William_J_G Overington asked:
For example, if a situation arose where a fast timetable is set for introducing one or
more new currencies, each with a new currency symbol, is there a contingency plan in
place such that what is preently set to be called Unicode 6.2 becomes
On 6/1/2012 1:51 PM, Doug Ewell wrote:
At what point does text
encoded in a vendor's private-use extension to Shift-JIS become
Shift-JIS encoded text?
A possibly less confusing way to put this is:
At what point does text encoded in a vendor's private-use extension
to *JIS X 0208* become
On 6/20/2012 3:22 PM, Karl Williamson wrote:
All current named sequences appear to be each a single grapheme. That
seems like it should always be the case.
Possibly, but keep in mind that neither the Unicode Standard nor UAX #29
in particular
define what a grapheme is. UAX #29 specifies an
On 6/21/2012 11:22 PM, Julian Bradfield wrote:
So, as long as code charts create production issues, print-on-demand for
them is effectively not feasible.
My hard-copy of the code charts was printed by Lulu - they're too big
to print out on my office laserprinters!
The only issue was joining
On 6/22/2012 3:55 PM, John H. Jenkins wrote:
Wait a minute. Isn't 6.2 just adding the Turkish Lira? Does that really take
the chart people more than about 10 minutes?
The only *character* change is the Turkish lira. There are numerous updates
to UAXes and other parts of the
On 7/10/2012 4:22 PM, Mark Davis ☕ wrote:
I would disagree about the preference for ratio; I think it is a
historical accident in Unicode.
Not really.
The following pairs dating from Unicode 1.0 were deliberate:
U+002D HYPHEN-MINUS
U+2212 MINUS SIGN
U+002F SOLIDUS (Unicode 1.0 called it
On 7/13/2012 1:54 PM, Stephan Stiller wrote:
So there is a BOM-ambiguity when a file starts with
FF FE
and then a couple of U+ characters, yes? Because this could be
either UTF-16 or UTF-32 under little-endianness. Has this been pointed
out and discussed beforehand?
No, there is
On 7/25/2012 5:01 PM, Richard Wordingham wrote:
What is the formal relationship between the Common Locale Data
Repository (CLDR) and International Components for Unicode (ICU)?
...
The ICU implementation of collation tailoring for changed ordering is
bizarre in some complicated cases. (Life
On 7/26/2012 1:21 PM, Richard Wordingham wrote:
I thought the Unicode Consortium had a formal policy of forbidding
untrue (or misleading) claims of conformance to Unicode standards.
No. What would be the point? Voluntary standards organizations have
no mechanism for policing compliance.
Sure,
On 7/26/2012 4:20 PM, Richard Wordingham wrote:
Perhaps I've read too much into
http://www.unicode.org/policies/logo_policy.html . The implication is
that untrue or misleading claims using the word 'Unicode' are
contravening the trademark.
That's more on the level of making sure that when
On 7/26/2012 5:32 PM, Asmus Freytag wrote:
However, such a misleading claim might subject someone to civil suit,
don't you think?
Sure, if someone could make a reasonable case that the misleading
claim led to damages and wanted to litigate.
But that isn't something that the Unicode Consortium
On 8/13/2012 10:11 AM, Peter Edberg wrote:
I do not believe it was for accounting, logic, or mathematical use. It was included in the original
Macintosh character set as shown in Figure 2 of the Font Manager chapter of Inside Macintosh,
volume I (1985), but was not included in the shaded
On 8/13/2012 12:50 PM, Asmus Freytag wrote:
In that context, you can't distinguish a lozenge from a squished
diamond (*) from a diamond suit symbol.
While the character is one a of a set, it was not uncommon to have
people make do with somewhat similar characters standing in for each
other.
On 8/16/2012 9:32 AM, Erkki I Kolehmainen wrote:
Although the stroke is not a diacritic, keyboard drivers can be made to
generate atomic characters with stroke by using a dead letter key for stroke
together with the base character.
And in addition to this observation by Erkki, it is also the
Markus has already explained this. But the following explanation
fills out some details. These @@ lines are conveniences for chart
production. They are headers read by the unibook chart layout
tool, which help guide where chart layout for a block starts and stops.
The @@ lines are *NOT* block
I think this discussion is confusing the need for separate syntactic
functions
in formal language definitions with the need for *encoding* of characters.
The distinction between assignment and test for equality has been around for
decades in formal languages, and of course it is almost always
Philippe may have overlooked the fact that this has been tried (years
ago) in the
Unicode Standard. See: language tags.
http://www.unicode.org/versions/Unicode7.0.0/ch23.pdf#G26419
The syntax for those even goes beyond just ISO 639-2/3 to incorporate
the full range of BCP 47 tags, in
To follow up on Doug Ewell's response, the mechanism currently
standardized in the Unicode Standard for regional indicator codes
has an interpretation tied to the two-letter codes of ISO 3166-1,
and *not* to TLD's. The two are not directly connected.
If anyone really wants to pursue getting a
On 3/28/2015 1:05 PM, Karl Williamson wrote:
In the 8.0 Beta files, some numerical values are not reduced to their
lowest forms. Is there a compelling reason that
109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R6/12;N;
is not written as
109FB;MEROITIC CURSIVE FRACTION SIX
Search engine companies (and in particular, Google) have such
information squirreled away in their index databases, at least as
far as usage stats for Unicode characters on the web go -- but it
is proprietary information, and they generally don't publish
information about such statistics.
On 3/23/2015 8:35 AM, William_J_G Overington wrote:
Origin of the digital encoding of accented characters for Esperanto
Twelve accented characters (uppercase versions and lowercase versions
of six accented letters) used for Esperanto are encoded in Unicode.
WJO is referring to U+0109,
For ISO 8859-3, the answer is in the wiki:
http://en.wikipedia.org/wiki/ISO/IEC_8859-3
It was designed to cover Turkish, Maltese and Esperanto, ...
The answer for IBM CP905 is simple -- it is simply the EBCDIC
code page of June, 1986 that corresponded to ISO 8859-3.
That also covers the answer
Taking this thread back to the original question...
The Line_Break property values for halfwidth katakana (lb=AL)
and regular katakana (lb=ID) have been stable since they
were first defined for Unicode 3.0 -- 15 years ago.
Regardless of whether lb=AL is the optimal assignment for
the halfwidth
Suzuki-san,
On 5/1/2015 8:25 AM, suzuki toshiya wrote:
Excuse me, there is any discussion record how UAX#14 class for
halfwidth-katakana in 15 years ago? If there is such, I want to
see a sample text (of halfwidth-katakana) and expected layout
result for it.
The *founding* document for the
And to put Mark's comments in some statistical perspective, in the context
of all the media hype, the true big bang for emoji in Unicode was
Version 6.0,
released over 4-1/2 years ago now. *That* was the Unicode release that added
hundreds and hundreds of emoji for Japanese carrier
On 6/3/2015 5:17 PM, John wrote:
so what?
There should be a standard way to put custom characters anywhere that
characters belong and have things “just work”.
Well, that's the rub, isn't it?
We (in IT) are still working pretty dang hard on the simpler problem, to
wit:
There
Karl,
This results from the fact that the fallback behavior for the modifiers is
simply as independent pictographic blorts, i.e. the color swatch images.
That is also related to why they are treated as gc=Sk symbol modifiers,
rather than as combining marks or format characters.
If you *support*
Karl,
As usual, the situation is way more complicated that perhaps it has any
business
being!
It isn't just Version 1 Hangul that have to be considered, but also
Version 1.1 Hangul.
Version 1.0 contained 2350 Hangul syllables, encoded in the range
3400..3D2D.
Version 1.1 contained 6646
the early 1990's might know, however.
--Ken
On 6/24/2015 1:03 PM, Karl Williamson wrote:
On 06/19/2015 04:12 PM, Ken Whistler wrote:
The Unicode 2.0 set of 11,172 was known as the Johab set from KS C
5601-1992.
That was an algorithmically designed replacement of the earlier sets
from
Korean
and
specifications in the development of products, but to discourage
attempts to use the data in nonconformant or otherwise misleading
implementations that would undermine the intended open interoperability
of the Unicode Standard for all.
Clear?
--Ken Whistler, Technical Director, Unicode, Inc.
On 6
Michel Suignard (editor of ISO/IEC 10646) responded to these questions,
but let me augment his response with some more detailed history here.
(Pardon the length of the reply, but these things tend never to be as
simple as people assume and hope they are.)
On 5/28/2015 2:08 PM, Chris wrote:
So
Doug,
Read on in the minutes to the next day. 143-C27 and related actions.
There are a few things to keep in mind here.
1. The un-deprecation of the tags U+E0020..U+E007E *is* part of
the UCD for Unicode 8.0. The change has already taken place in
the revised beta files now posted (see
On 5/29/2015 5:20 PM, gfb hjjhjh wrote:
1. I have seen a chinese character ⿰言亜 from a Vietnamese dictionary
NHAT DUNG THUONG DAM DICTIONARY**
So, a.) In http://www.unicode.org/alloc/Pipeline.html , it show that
CJK Extension E and F have already been accepted, but where can I
check
On 6/2/2015 2:01 AM, William_J_G Overington wrote:
Local glyph memory, for use in compressing a document where the same
glyph is used two or more times in the document:
Um, that technology already exists. It is called a font.
A mechanism to be able to use the method to define a glyph
On 7/2/2015 5:56 PM, Peter Constable wrote:
Erkki, in this case, I think Philippe is making valid points.
-For the proposal to be workable requires some means of ensuring
stability of encoded representations. The way this would be done would
be for CLDR to provide data with all valid
On 7/3/2015 9:14 PM, Leo Broukhis wrote:
On Fri, Jul 3, 2015 at 12:50 PM, Doug Ewell d...@ewellic.org wrote:
Leo Broukhis leob at mailcom dot com wrote:
What I don't like about PRI #399 is its proposing to use default-
ignorable characters. On a non-vexillology-aware platform, I'd like
to
Noah,
Additional information you should have is that the UTC is about to
publish a new Public Review Issue on the topic of an extended mechanism
for the representation of more flag emoji with sequences of tag characters.
(Note: *not* representation as encoded single character symbols.)
That
On 7/2/2015 2:01 AM, Philippe Verdy wrote:
The frozen status of Antarctica ...
... will be addressed separately by global warming. But be that as it may...
In really there's still no standard way to encode flags unambiguously
and in a stable way. We'd like to have FOTW (Flags of the
1 - 100 of 247 matches
Mail list logo