Karl Williamson shared:
> https://www.reddit.com/r/Denver/comments/fsmn87/quarantine_boredom_my_emoji_map_of_colorado/?mc_cid=365e908e08&mc_eid=0700c8706b
It's too bad this was only made available as an image, not as text, which of
course it is.
--
Doug Ewell | Thor
ked at GB18030, there
> were a lot of them not in Unicode.
I'd forgotten that there were still about two dozen GB18030 characters mapped,
more or less officially, into the Unicode PUA. But again, I changed the
subject. Sorry about that.
--
Doug Ewell | Thornton, CO, US | ewellic.org
one of the PUA space
considered appropriate for that in the meantime?
--
Doug Ewell | Thornton, CO, US | ewellic.org
Adam Borowski wrote:
> Also, UTF-8 can carry more than Unicode -- for example, U+D800..U+DFFF
> or U+11000..U+7FFF (or possibly even up to 2³⁶ or 2⁴²), which has
> its uses but is not well-formed Unicode.
I'd be interested in your elaboration on what these uses are.
--
Doug Ew
c way. If any of them has that structure interrupted by random bytes, the format has been broken and the file is corrupt. It is no different for text data, which is expected to contain certain bytes and is normally not expected to be interrupted by a series of ranëH‰UÀHƒÈÿH Does that help? --Doug Ewell | Thornton, CO, US | ewellic.org
eat, that means I'll
be able to start using and exchanging them in March, when Unicode 12.1
is released, right? Uh, no:
1. What Ken said above.
2. Unicode 12.1 was always just about the Reiwa sign.
3. Even when 13 comes out, fonts won't be immediately and magically
updated to incl
;ve heard, anyway.
Just out of curiosity, does anyone have actual examples of such
applications? This might help demonstrate why the Reiwa sign doesn't set
a precedent for TB et al.
--
Doug Ewell | Thornton, CO, US | ewellic.org
㍽ ㍼ ㍻ that could not simply
use the two existing characters 令和 for Reiwa.
--
Doug Ewell | Thornton, CO, US | ewellic.org
le candidate for a proposal. UTC won't add a
character based on mailing-list chat, of course; they'll need a proper
proposal. They'll also be the ones to decide what code point is
assigned, although the proposal can politely suggest one.
--
Doug Ewell | Thornton, CO, US | ewellic.org
seem to be non-controversial.
So to reiterate, these characters appear vanishingly unlikely to be
atomically encoded, "yet" or ever, for good reason.
--
Doug Ewell | Thornton, CO, US | ewellic.org
cy
character sets (which I think is what Philippe is referring to as
"proposed groupings").
--
Doug Ewell | Thornton, CO, US | ewellic.org
was pretty much
that.
While the Unix/C "end of string" convention was not the only case in which NUL
was hijacked, it is certainly the best-known, and the greatest impediment to
any current attempt to use it with its original meaning.
--
Doug Ewell | Thornton, CO, US | ewellic.org
or, and "display
a .notdef glyph" is one of the popular choices.
--
Doug Ewell | Thornton, CO, US | ewellic.org
Andrew West wrote:
> oh, there is no Wikidata QID for phone dropped in the toilet.
It's Wikidata, right? Pretty much anyone can create an item for pretty much
anything, right? Problem solved.
--
Doug Ewell | Thornton, CO, US | ewellic.org
mind of the
phone's owner, and there are none for the brand and model of phone and toilet.
So the sequence above is clearly inadequate for people to express themselves.
--
Doug Ewell | Thornton, CO, US | ewellic.org
rds the medical
> profession.
If physicians and other medical professionals are relying on emoji, in any way
and at any time, to determine diagnosis and treatment, the state of health care
is much worse than I thought.
--
Doug Ewell | Thornton, CO, US | ewellic.org
g truly similar to and/or derivative of
Format A.)
Please reply on-list only if you think the list at large would benefit
from your reply. I'm hoping some of the Unicode elders might have some
insight here.
--
Doug Ewell | Thornton, CO, US | ewellic.org
classic
rule about "established, not ephemeral" would still apply.
--
Doug Ewell | Thornton, CO, US | ewellic.org
physical challenges; but neither of those is what Unicode is about.
For non-emoji characters, there is usually still a requirement to show a
certain level of actual usage.
--
Doug Ewell | Thornton, CO, US | ewellic.org
ht, and the fonts are wrong.
--
Doug Ewell | Thornton, CO, US | ewellic.org
announcements at unicode.org wrote:
> The alpha version of Unicode CLDR 35
> <http://cldr.unicode.org/index/downloads/cldr-35> is available for
> testing.
No downloadable data files in the sense of released builds, correct?
--
Doug Ewell | Thornton, CO, US | ewellic.org
r). The email was not intended as a final
proposal. I do think it would be strange for single and double underlining not
to cancel each other out.
> Note that these do NOT nest (no stack...), just state changes for the
> relevant PART of the "graphic" (i.e. style) state. So the approach in
> that regard is quite different from the approach done in HTML/CSS.
I don't regard that as either a bug or a feature. I certainly don't expect that
every such mechanism has to nest, simply because SGML and its descendants are
designed that way.
--
Doug Ewell | Thornton, CO, US | ewellic.org
-only convention will ever be.
--
Doug Ewell | Thornton, CO, US | ewellic.org
http://www.unicode.org/faq/utf_bom.html#utf8-2
--
Doug Ewell | Thornton, CO, US | ewellic.org
ne or the Plane That Shall Not Be Mentioned.
"Deprecated" is a term of art in Unicode.
--
Doug Ewell | Thornton, CO, US | ewellic.org
tes contain non-ASCII characters...)
None of these are part of Andrew's mechanism. It's just b, i, u, and s.
> is not standard
Neither Andrew nor anyone else claimed it was.
> (it's just an experiment in one font),
It applies to any TrueType font, because the rendering engine can apply these
four styles (in any combination) to any TrueType font.
> and would in fact not be compatible with the existing specification
> for tags.
Good thing nobody claimed they were.
> So only E+E0020 through U+E0040, and U+E005B through U+E007E remain
> deprecated.
Da capo.
--
Doug Ewell | Thornton, CO, US | ewellic.org
moji flags. Only U+E0001 LANGUAGE TAG and U+E007F CANCEL TAG are
deprecated.
--
Doug Ewell | Thornton, CO, US | ewellic.org
-6, but should
> be of interest for the generalised subject of this thread.
I'm hoping we can continue to restrict this thread to plain text.
--
Doug Ewell | Thornton, CO, US | ewellic.org
in the use of Mathematical Alphanumeric
Symbols: they look tempting and are (usually) easy to render, but among
other things, they only cover [A-Za-zıȷΑ-Ωα-ω] and thus miss much
of the text that may need to be italicized.
--
Doug Ewell | Thornton, CO, US | ewellic.org
if it does not, why we should not simply refer to the more
familiar 6429?
--
Doug Ewell | Thornton, CO, US | ewellic.org
ins why SCSU had to be banished to the hut, right around
the same time the Plane 14 language tags were deprecated. In SCSU,
astral characters can be 1 byte just like BMP characters.
--
Doug Ewell | Thornton, CO, US | ewellic.org
is an anti-goal
of Unicode. But these are NOT the same idea, and the fact that they both
use Plane 14 tag characters doesn't make them so.
--
Doug Ewell | Thornton, CO, US | ewellic.org
se because it's abuse, doesn't cover other writing
systems, etc.
I'd be happy to work with Kent to campaign for ISO 6429 as "the"
well-established standard for applying simple styling to plain text, but
we would have to acknowledge the significant challenges.
--
Doug Ewell | Thornton, CO, US | ewellic.org
that was? I don't have time to go through scores of messages, and there
is no search facility.
I can't speak for Andrew, but I strongly suspect he implemented this as
a proof of concept, not to declare himself the Maker of Standards.
--
Doug Ewell | Thornton, CO, US | ewellic.org
ttp://www.unicode.org/copyright.html, and
2. send a quick note to the Consortium officers asking whether they are
OK with this use of the Unicode name.
--
Doug Ewell | Thornton, CO, US | ewellic.org
ke this
to implement features like background and foreground colors, inverse
video, and more, which are not available as plain-text characters.
--
Doug Ewell | Thornton, CO, US | ewellic.org
not to this list, since the digest will clobber such
characters (quod vide).
--
Doug Ewell | Thornton, CO, US | ewellic.org
become particularly widespread within the Koalib community. (And no,
this does not constitute "disdain for the small community.")
--
Doug Ewell | Thornton, CO, US | ewellic.org
I know this is an old argument and this will probably never be fixed,
but I wish the Unicode email digest could be updated to support, you
know, Unicode.
--
Doug Ewell | Thornton, CO, US | ewellic.org
includes non-math
characters (like circled and fullwidth) as well, and is probably better,
although it doesn't include the ransom-note option:
http://www.babelstone.co.uk/Unicode/text.html
--
Doug Ewell | Thornton, CO, US | ewellic.org
t get either the Ulster Banner or St. Patrick's
Saltire.
This situation is described, and explicitly so for the UM flags, in
Annex B of UTS #51 under "Caveats."
--
Doug Ewell | Thornton, CO, US | ewellic.org
lid or non-existent.
I would certainly like to use the flag of Colorado, whose visual
appearance is very much standardized, but the vicious circle of vendor
support and UTS #51 categorization means no system will offer glyph
support, and some systems may even reject it as invalid.
--
g list. If you want to propose something,
you should consider writing a proposal.
--
Doug Ewell | Thornton, CO, US | ewellic.org
it does not
> argue for a new SEVEN WITH STROKE character or that I should use Ƶ
> rather than Z when I write *Ƶanƶibar.
http://www.unicode.org/L2/L2018/18323-open-four.pdf
--
Doug Ewell | Thornton, CO, US | ewellic.org
Do we have any other evidence of this usage, besides a single
handwritten postcard?
--
Doug Ewell | Thornton, CO, US | ewellic.org
U+FC63 (presentation forms).
Arabic presentation forms are never an example of anything, and their
use is full of caveats.
--
Doug Ewell | Thornton, CO, US | ewellic.org
rich-text Web page, it could easily.
The article "English numerals" does include a bullet point: "The
suffixes -th, -st, -nd and -rd are occasionally written superscript
above the number itself." Note the word "occasionally."
--
Doug Ewell | Thornton, CO, US | ewellic.org
You're joking, right?
Аа Бб Ее Рр
This undermines a lot of what you are claiming to know about writing
systems, and about the difference between case distinctions and styling.
--
Doug Ewell | Thornton, CO, US | ewellic.org
Bringing U+02B3 or U+036C into the discussion just
fuels the recurring demands for every Latin letter (and eventually those
in other scripts) to be duplicated in subscript and superscript, à la
L2/18-206.
Back into my hole now.
--
Doug Ewell | Thornton, CO, US | ewellic.org
encoding," he was
asking about the basic definition of base64.
--
Doug Ewell | Thornton, CO, US | ewellic.org
touched on this a little bit in UTN #14, from the standpoint of trying
to improve compression by normalizing the Unicode text first.
--
Doug Ewell | Thornton, CO, US | ewellic.org
g to solve by introducing LS and PS, but we
know how that went.)
3. Unicode data files can be read and processed on any platform, but
some careful choice of reading and processing tools might be advisable.
--
Doug Ewell | Thornton, CO, US | ewellic.org
with the correct pagination."
which similarly assumes that "users of Microsoft Windows" have only
Notepad at their disposal.
--
Doug Ewell | Thornton, CO, US | ewellic.org
iven these two alternatives.
--Doug Ewell | Thornton, CO, US | ewellic.org
Original message Message: 3Date: Thu, 30 Aug 2018 02:27:33
+0200 (CEST)
From: Marcel Schneider via Unicode
Curiously, UnicodeData.txt is lacking the header line. That makes it unflexible.
I never wonder
On August 23, 2011, Asmus Freytag wrote:
> On 8/23/2011 7:22 AM, Doug Ewell wrote:
>> Of all applications, a word processor or DTP application would want
>> to know more about the properties of characters than just whether
>> they are RTL. Line breaking, word breaking, and
2
and 3 without step 1. I'd gladly participate in such a project.
--
Doug Ewell | Thornton, CO, US | ewellic.org
work.
I have anecdotes, if anyone is interested off-list.
--
Doug Ewell | Thornton, CO, US | ewellic.org
platform for emoji to take off to such an
extent that... well, we know the rest. If private-use is good enough for
a legacy encoding, it ought to be good enough for Unicode.
--
Doug Ewell | Thornton, CO, US | ewellic.org
e organization.
--
Doug Ewell | Thornton, CO, US | ewellic.org
ation of character
identity, but Arial Unicode MS has not been updated since 2000 and this
problem is likely to remain unsolved.
--
Doug Ewell | Thornton, CO, US | ewellic.org
ith actual
General Categories? For example, an 'Mc' followed by ZWSP followed by an
'Lo' displays like such-and-so. The code points would be best.
Incidentally, does CLDR define the rendering of soft hyphen, or is one
entirely at the mercy of the application?
Why would this b
x27;t apply to Tamil, of course."
In any case, Ken has answered the real underlying question: a process
that checks whether each character in a sequence is "alphabetic" is
inappropriate for determining whether the sequence constitutes a word.
--
Doug Ewell | Thornton, CO, US | ewellic.org
ese don't apply to Tamil, of course.
--
Doug Ewell | Thornton, CO, US | ewellic.org
uage has its own language-specific alphabet.
It is the same for Bengali and Assamese, although the language-specific
subsets are called abugidas instead of alphabets.
--
Doug Ewell | Thornton, CO, US | ewellic.org
I wrote:
> ক্ is a conjunct consisting of three code points
s/ক্/ক্ষ/
--
Doug Ewell | Thornton, CO, US | ewellic.org
ingle attribute byte assignment in ISCII. Disunifying Assamese from
Bengali in Unicode would have a much greater impact.
--
Doug Ewell | Thornton, CO, US | ewellic.org
, German,
Spanish, and hundreds of other languages written in the Latin script, if
the Assamese proposal is approved we can expect similar disunification
of the Latin script into language-specific alphabets in the future.
--
Doug Ewell | Thornton, CO, US | ewellic.org
points?"
I did appreciate the Acknowledgements section which lists the members of
ABBA as a source of inspiration.
--
Doug Ewell | Thornton, CO, US | ewellic.org
Oh, let him have a little fun. At least he's using emoji for something
related to characters, instead of playing Mr. Potato Head.
Incidentally, more prior art on large-base encoding:
https://sites.google.com/site/markusicu/unicode/base16k
--
Doug Ewell | Thornton, CO, US | ewellic.org
for improvement".
I think that is a measurement of locale coverage -- whether the
collation tables and translations of "a.m." and "p.m." and "a week ago
Thursday" are correct and verified -- not character coverage.
--
Doug Ewell | Thornton, CO, US | ewellic.org
Unicode"
project. So either the UDHR translation is wildly incorrect, which seems
unlikely, or the transliteration tables are incomplete.
Wikipedia shows digraphs Iý ıý for Ю ю, and Ia ıa for Я я, and
nothing for the others, though it is not clear where the digraphs came
from, and of course the
e," implying an outside chance
that it might be.
I've linked Manish's post on FB as a reply to one of those mainstream
articles that repeatedly calls the conjunct a "single character,"
written by a staffer who couldn't be bothered to find out how a writing
syst
t is especially popular in the
IETF. It is not intended for situations that require explanation or
details.
--
Doug Ewell | Thornton, CO, US | ewellic.org
tinKeyboard.zip
--
Doug Ewell | Thornton, CO, US | ewellic.org
is more conservative.
That too. Good point.
--
Doug Ewell | Thornton, CO, US | ewellic.org
(b) it doesn't ship with Windows
Of course that is not a "luxury." Knowing that third-party options are
available, let alone free and easily installed ones, is the luxury.
--
Doug Ewell | Thornton, CO, US | ewellic.org
priority constituency,
but you'd be surprised.
> To like a particular layout does not mean to want to stick with it
> when anything better comes up. Userʼs choice is always respected.
See above regarding what users might like if only they had a choice.
--
Doug Ewell | Thornton, CO, US | ewellic.org
rmat and syntax to better "support
keyboard layouts from all major providers." Please point me to the part
I missed.
--
Doug Ewell | Thornton, CO, US | ewellic.org
flect what
vendors have released.
--
Doug Ewell | Thornton, CO, US | ewellic.org
s "apostrophe" will not save
> anything ; but the regular Unicode apostrophe U+2019 would need... 3
> bytes after the 1-byte basic Latin letter from ASCII (so it is
> worse !).
I did not see any evidence that this was something they ever considered
or cared about.
--
Doug Ewell | Thornton, CO, US | ewellic.org
longitude than Kazakhstan.
Most of the participants in this "apostrophe" thread appeared to be from
North America and Western Europe; I think you're the only one who
expanded that. I wasn't referring to the geographical or cultural makeup
of the list as a whole.
--
Doug Ewell | Thornton, CO, US | ewellic.org
h wait, you must be talking about AutoCorrect on Microsoft Word. Just
visit AutoCorrect Options and turn off that particular "replace as you
type" option, and be done with it.
--
Doug Ewell | Thornton, CO, US | ewellic.org
vens and the earth
in only 6 days was that there was no installed base to worry about.
--
Doug Ewell | Thornton, CO, US | ewellic.org
"standard keyboard," meaning an English-language one.
Nazarbayev may ultimately be persuaded to embrace ASCII digraphs, which
also meet this goal, but this talk about U+2019 and U+02BC will make
exactly zero difference in Kazakh policy.
--
Doug Ewell | Thornton, CO, US | ewellic.org
r his aegis) will ever use it).
Nevertheless, I wonder if it would be appropriate for Unicode or WG2, in
some capacity, to protest in some formal way against this recommendation
to arrogate an unassigned plane instead of using the PUA, which is the
correct place for unassigned characters.
--
Doug
On January 5, Mark Davis wrote:
Doug, I modified my working draft, at
https://docs.google.com/document/d/1EuNjbs0XrBwqlvCJxra44o3EVrwdBJUWsPf8Ec1fWKY
If that looks ok, I'll submit.
Sorry for the delay. The text substitutions look fine.
--
Doug Ewell | Thornton, CO, US | ewellic.org
R subdivisions, not just three, with the
understanding that the vast majority would not be supported by vendor
glyphs. II t is unfortunate that, while the conciliatory name
"recommended" was adopted for the three, the intent of "exclusively
permitted" was retained.
--
Doug Ewell | Thornton, CO, US | ewellic.org
vention with
no basis in history or usage.
--
Doug Ewell | Thornton, CO, US | ewellic.org
uot; that show up in proposals from time to time, but have
never been used except by their inventors and to talk about them.
--
Doug Ewell | Thornton, CO, US | ewellic.org
ximum U+10,
four bytes) for almost fourteen years now.
--
Doug Ewell | Thornton, CO, US | ewellic.org
now the character encoding of
the data, and would not split multi-byte sequences in that encoding to
begin with.
--
Doug Ewell | Thornton, CO, US | ewellic.org
nverted to other
character sets, or indexed in search engines.
Font Awesome also includes some symbols that, we think, won't ever be
Unicode emoji, such as the Android, Apple, Bluetooth, and Windows logos.
--
Doug Ewell | Thornton, CO, US | ewellic.org
eirdnesses, much more so than with other technical topics. This scares
newbies and they walk away thinking every aspect of Unicode is complex
and weird.
--
Doug Ewell | Thornton, CO, US | ewellic.org
har is 227(*255)+131.
and:
> While UTF8 uses only 2 bytes to store data AL32UTF8 uses 2 or 4 bytes.
Unicode and UTF-8 have been around a long time by now. The fact that
there is still fake news like this out there, steering our less
Unicode-aware colleagues waaay down the wrong path, is disc
ithin the same font.
I thought that was one of the main reasons we had Unicode: so we would
no longer have to rely on particular fonts, or magic font behavior, to
get character identities we expected and could interchange reliably.
--
Doug Ewell | Thornton, CO, US | ewellic.org
Michael Bear wrote:
> When are the code charts (http://www.unicode.org/charts/) going to be
> updated for Unicode 10.0?
They look fine to me.
--
Doug Ewell | Thornton, CO, US | ewellic.org
to-decipher
characters, and we were looking for insight from the original folks who
worked on them. We have no shortage of present-day expertise.
--
Doug Ewell | Thornton, CO, US | ewellic.org
Martin J. Dürst wrote:
> Assuming (conservatively) that it will take about a century to fill up
> all 17 (well, actually 15, because two are private) planes, this would
> give us another century.
Current estimates seem to indicate that 800 years is closer to the mark.
--
Doug Ewell |
normal ideographs.
A new square compatibility character, if necessary, can be encoded after
the era name is chosen. It might be fast-tracked at that time, as the
Euro sign was, but there is no emergency about this and no reason to
invent any new encoding procedures or waive any existing ones.
--
Doug Ewell | Thornton, CO, US | ewellic.org
Richard Wordingham wrote:
> even supporting 6-byte patterns just in case 20.1 bits eventually turn
> out not to be enough,
Oh, gosh, here we go with this.
What will we do if 31 bits turn out not to be enough?
--
Doug Ewell | Thornton, CO, US | ewellic.org
1 - 100 of 1929 matches
Mail list logo