See also http://www.unicode.org/review/index.html#pri33
Rick
Philippe,
It seems that something got wrong when installing the new
Unicode server, and that many messages that were initially
processed in the message queue were missing.
This is false! Please note that some people said they re-sent their
messaages (which was also unnecessary). There was
Afrian Oracle wrote...
The Yoruba Digital Consortium
www.africaservice.com/yorubadigital might push the idea of e, o with dot
below and grave or acute accent to make it easier for font and keyboard
developers to implement.
What do you think?
My opinion: it isn't easier, just different. The
Dele --
This mail is written with the Yoruba Keyboard that was rolled out
yetserday. Please just look at the issue raised earlier raised.
You sent Unicode plain text, not an image. If you look at this with
different fonts or different platforms, you get slightly different results.
If you
Since Peter Kirk wrote, on the Unicode list, I'll CC the list.
Peter Kirk wrote:
I sent several messages to the list between 16:20 and 16:30 GMT
which were simply lost.
You are wrong. They were not lost -- at least not on this server. Check
the archives. (OK, I've had some config trouble
around the world,
you'll probably see a few messages appear on the list once the new service
is up, even before I announce that the new server is on-line.
Regards,
Rick McGowan
Unicode, Inc.
are missing or corrupted please do not hesitate to contact me (off list
please). I will investigate.
Regards,
Rick McGowan
Unicode, Inc.
Michael Everson wrote...
The historical cut that has been made here considers the line from
Phoenician to Punic to represent a single continuous branch of
script evolution.
I think Rick McGowan wrote that sentence in UTR#3.
Indeed, I did. And I based my take on this history on the secondary
Peter Kirk wrote...
But on the other hand, the lack of a consensus among *any*
people that they have a need for an encoding does seem to imply that
there is no need for an encoding.
In this, you are utterly wrong, I'm afraid. We (in UTC) have seen
situations before where one group desires
.
(By the way, there is no need to discuss this move on the Unicode list. If
you have questions or concerns, please e-mail me off-list.)
Regards,
Rick McGowan
Unicode, Inc.
I've never managed to get either Notepad or Word to open Unihan.txt
Just use EMACS. Works fine.
Rick
/press_release-cldr.html
Regards,
Rick McGowan
Unicode, Inc.
I am looking for some table of radicals
that I can show our customer to help support that claim.
I think maybe you're looking for Chapter 17, the Radical Stroke Index, but
it's not printed in the online edition of the book. You can always buy
the book...
Rick
Philippe...
Thanks for your concern but,
1. This isn't the forum for analyzing virus spam. Please you, and everyone
else, let's try to keep this list tolerably on topic.
2. A virus in fact *did* transit this system, and your analysis is rather
wrong in several points. But since this isn't
Asmus wrote:
Unfortunately in case of any proposed characters, web-sites can be
used as evidence only in a very limited way. [ ... ]
So what we learn from this site, is -unsurprisingly- that the cent
sign can be used as a fallback.
Yes, precisely, unless they have *pictures* of the things in
[EMAIL PROTECTED] wrote:
> The cedi sign should be of the size of the dollar sign ($) or the euro sign
> (EUR). The site you provided is using the cent sign. The Ghana web site uses a
> better version of the cent sign for the cedi. See
>
Rick Cameron asked...
It appears that Unihan.txt does not include mappings to Shift-JIS,
Right. It includes JIS mappings (for the Han portions of JIS).
and that the only file on unicode.org that contains mappings between
Shift-JIS and Unicode is in the 'obsolete' section.
Please read the
Peter Kirk wrote...
... I have a real requirement. The UTC has the power to meet my requirement,
and to do so rather simply. I am asking them to meet it.
Actually, you are not asking UTC anything. You are discussing the PUA on a
public-access mail list. There's a big difference. This *is*
Peter Kirk wrote...
I am undecided yet whether to make a formal proposal.
Ken seems to suggest that this would be a waste of time -
Yes. I also think it would be a waste of time, but...
although I can see some advantages in obtaining a formal rejection.
... I can also see some value in a
Oops.
Well...
*That* was a day early.
Rick
D Starner wrote:
But in practice I don't know of a single
program that allows you to change the properties of Unicode
characters without a recompile.
It's been a while since I've programmed with Apple's Cocoa environment,
but when last I looked, it dynamically loaded the property tables at
Unicode 4.0.1 has been released! The data files and documentation are
final and posted on the Unicode site. For details, see the version page for
Unicode 4.0.1 at:
http://www.unicode.org/versions/Unicode4.0.1/
Unicode 4.0.1 is an update version of the Unicode Standard. It adds no new
Michael Everson suggested this might be preferrable:
PUA characters can be defined, locally and privately, according to
some protocal which will WORK if people write software to do what
they want
Yeah, probably preferrable if you want to use the PUA. To get anything to
work, people have to
Can we take further discussions of censorship and proscribed words OFF this list?
Thanks,
Rick
Ernest Cline wrote...
Consider for example, a font that offered both of the common glyph
variants of PLUTO. At present, one would be have to be encoded as
U+2647 and the other as a private use character, say U+E647.
Well, not necessarily. Depending on your system, one of them could just be
use the reporting link above
to generate comments for UTC consideration.
http://www.unicode.org/consortium/distlist.html
Regards,
Rick McGowan
Unicode, Inc.
Regarding Moon script... and Braille...
And surely Braille could equally be
considered a cipher of Latin script (although the same symbols are also
used as a cipher of other scripts).
No, Braille is not a cipher of any other script. It is *not* simply
one-to-one mappable to/from the Latin
list are not automatically
recorded as input to the UTC. You must use the reporting link above
to generate comments for UTC consideration.
http://www.unicode.org/consortium/distlist.html
Regards,
Rick McGowan
Unicode, Inc.
Philippe Verdy asked:
What is, in Unicode the BiDi behavior of PUAs?
Read the documentation, Philippe. UAX #9 and the UCD tell you some info
about the bidi behavior of PUA characters; and if you then go look at the
data file, in 30 seconds you can find:
E000;Private Use,
What organization uses the ANARCHY SYMBOL? ;-)
That would be the DIS Organization.
Rick
See for example, http://www.4commongood.org/images/circlea.jpg
OK.
4) Determine a suitable code-page for the character
We don't do code pages; and you can actually skip this step if you
aren't sure where it might be able to go.
6) Create or find a computerised font representation of the
Marion Gunn wrote...
I do know my language is being badly served, however.
And I would conclude, given the discussion we've seen on this list, that
your language isn't being badly served by the Unicode Standard (or any
other character encoding), but by some fonts and their vendors.
You
John Snow asked:
I went to have a look at the archives on Yahoo Groups and can't seem to
find them! Can anybody give me the exact URL
The Yahoo Unicode group seems to have gone away. It disappeared a while
ago. I went looking for it a few weeks ago and can't find it. I have no
furtehr
.
http://www.unicode.org/consortium/distlist.html
Regards,
Rick McGowan
Unicode, Inc.
As long as we're on the topic, I have to weigh in on the conservative side
in this argument, with Ken Whistler. Use of the existing subscript
characters is generally bad practice. Adding more subscripts would be
adding to the bad practice, and yield even more different ways to express
the
Philippe --
The data to answer your question is one the web. Since the new list of
soft-dotted characters appears in the latest data file, and can be compared
with previous files to see what has changed, you could try looking at the
difference between this:
.
Regards,
Rick McGowan
Unicode, Inc.
Philippe (and others who might be looking),
I can't remember what was decided about the Soft-Dotted property of some
Latin
ligatures/digraphs with i or j in PR #11 (yes it was closed on last
August...).
The resolved issues are posted on the Resolved Issues page. It is linked
from the
Chris --
Note: I am not speaking officially, just giving my opinions.
> http://www.languagegeek.com/issues/ucas_unicode.html
Sorry I have no opinions at all about the major questions you are asking on the above page, and we probably need to involve some experts. The source documents for the
://www.unicode.org/consortium/distlist.html
Regards,
Rick McGowan
Unicode, Inc.
are not automatically recorded
as input to the UTC. You must use the reporting link above to generate
comments for UTC consideration.
http://www.unicode.org/consortium/distlist.html
Regards,
Rick McGowan
Unicode, Inc.
Peter,
I actually discussed this very issue with Debbie Anderson late last week.
She has raised the issue with TLG and with Coptic expert contacts. No
answer has yet been received from these experts (i.e., off the tops of
their heads, nobody seems to know what to do).
It is my feeling that
I don't see the relevance of Coptic experts to this issue.
Sorry... I meant other Greek expert contacts. (I'd just been reading
some Coptic-related docs.)
Rick
Three new Unicode Technical Notes are now available on the Unicode website.
The main Tech Notes page is here:
http://www.unicode.org/notes/
The new notes are:
#11 Representing Myanmar in Unicode: Details and Examples
by Martin Hosken Maung Tuntunlwin
#12 UTF-16 for
This is your friendly reminder that the February UTC meeting is quickly
approaching. There are several public review issues open. So far, public
comment has been light. I hope you have all been working diligently on your
comments during the cold dark days of winter and are ready to spring
if
you have comments, please try to send them in soon.
Note: If you are a liaison representative, please forward this message as
appropriate within your organization.
Regards,
Rick McGowan
Unicode, Inc.
Excuse me, but the actual subject of this thread isn't Cuneiform anymore.
It has morphed to a discussion of Panther PUA codepoints, so can you all
please use a different subject line?
Thanks,
Rick
organization.
Feedback is welcome. You may submit comments directly by using our
reporting form at http://www.unicode.org/reporting.html
Regards,
Rick McGowan
Unicode, Inc.
Not to prolong this thread, but... Doug wrote:
There may be a parallel, however tenuous, in the Federalist Papers, a
series of articles that led to the drafting of the U.S. Constitution.
Sorry, factual error. Those papers did not *lead* to the drafting of the
Constitution, they were a set of
Theodore H. Smith asked:
> I've often wanted to type a symbol, that's like an exclamation mark,
> and a comma at the same time. That is, instead of the . on the bottom
> of a !, it has a , instead.
> Is there such a Unicode code point? Just out of curiosity! Or I
> suppose, is there a way to
.
Regards,
Rick McGowan
Unicode, Inc.
John Cowan suggested...
We will never come close to exceeding this limit. Essentially all new
combining characters are either class 0 or fall into one of the 200-range
positional classes.
Or 9, for viramas.
One take-home point is that there won't be any more fixed position
classes added
Of course, as usual, this is my opinion. UTC hasn't actually made any
proclamations about what will or won't be done in terms of the classes or
what kinds of classes might be assigned in the future.
Rick
John Cowan suggested...
We will never come close to exceeding this limit.
The answer is also not quite so simple. Braille is not standardized across
languages, so Braille of different languages and countries, even though
they may use the same dot patterns, are not mutually comprehensible. There
is a lot of complication, not the least of which is six-dot versus
Jill Ramonsky asked on Nov 10:
My question went unanswered, so I'll ask it again - do I get a vote?
Hmm... I'm finally catching up on mail-list mail from this past week. The
short answer to your question is no, and others have said that. But, as
Philippe and others have said, you could join
Andrew, There isn't a CJK list.
Rick
CJK list ? Now if only there was a list of Unicode lists ...
We are pleased to announce the release of the 4.0.0 version of
Unicode Technical Standard #10: The Unicode Collation Algorithm
(UCA), which specifies a default sorting order and comparison
mechanism for all Unicode characters.
Major changes in this release include:
- The version of the UCA is
Philippe wrote...
The Unicode English name of the hacek character is caron (U+030C)
Just for the record: Actually, in English, we still call it a hacek.
Caron is a term apparently invented in an ISO character encoding
committee, and is *NOT* in current use at all in English. We call it a
Hello Tony,
Number one question: have you verified that DBArtisan 7 actually has
support for Unicode? I find nothing at all about Unicode support on QBS
Software web site, nor in the Embarcadero white paper, when I look at the
features of their products.
Rick
However, I want to
instructions for returning comments for UTC consideration.
Regards,
Rick McGowan
Unicode, Inc.
Jill Ramonsky wrote...
It seems to me that if 0x11 codepoints isn't a big enough space to fit in
the Klingon alphabet (and other alphabets which were similarly rejected)
then we need more codepoints. Simple as that.
Rejection of Klingon has *absolutely* nothing to do with space. Jill
John Cowan suggested:
The earth is finite and small, and there's no place for
large writing systems to hide from the eagle eyes of the Roadmappers.
Central Asia.
;-)
Rick
Before everyone goes jumping off the deep end with wanting to reserve more
space on the BMP for hyper extended surrogates or whatever, can someone
please come up with more than 1 million things that need to be encoded?
Our best estimate, for all of human history, comes in around 250,000. Even
Philippe Verdy wrote:
It's true that there is no plan in Unicode to encode something
else than plain text for existing or future actual scripts. But
ISO10646 objectives are to also to offer support and integrate
almost all other related ISO specifications that may need a
unified codepoint
Michael wrote...
Someone calculated that at the present rate of character encoding
(1000 a year) it would take something like 700 years to fill the
whole range of characters
I think Ken and I have both done similar calculations, which are a matter
of record in the mail list archives, if
Florian Weimer asked:
http://www.unicode.org/review/
Maybe I'm missing something, but I still can't find any reference that
the Unihan.txt file will be released under a license that permits
redistribution (which has been announced in other documents).
Ah, you're right. It will have the
://www.unicode.org/consortium/distlist.html
Regards,
Rick McGowan
Unicode, Inc.
François --
You might be interested to know that all of your recent mail has the
following header attached to it! Sounds to me like your outgoing server is
tagging mail, and it's getting things wrong.
Rick
X-Spam-Report: This mail is probably spam. The original message has been
myrkraverk...sourceforge wrote:
In a plain text environment, there is often a need to encode more than
just the plain character.
...
Since I'm using 64 bits, I call it Excessive Memory Usage Encoding, or
EMUE.
...
I thought of dividing the 64 bit code space into 32 variably wide
Michael wrote:
I was asked how I describe it briefly to laymen. And I usually say
Unicode is like a big, giant font that is supposed to contain all
the letters of all the alphabets of all the languages in the world.
Now, why do you suppose he removed *that* like and, like, left in all
the
). Please be aware
that discussion comments on the Unicode mail list are not automatically
recorded as input to the UTC. You must use the reporting link above
to generate comments for UTC consideration.
http://www.unicode.org/consortium/distlist.html
Regards,
Rick McGowan
Unicode, Inc.
Rajkumar S wrote:
The beta period closes on October 27, 2003. Since time is short,
developers are asked to please focus quickly on the data file review
if you have not yet done so.
I am a newbie wrt Unicode procedures. Does this mean that I can propose
some changes to Malayalam section
Curtis Clark,
Caviar, 10kg, FEED
Heh, heh... Don't you mean:
Caviar, Akg, FEED
;-)
Rick
What do hackers with non
Latin-based languages use for hex anyway?
They use 0-9, A-F, and a-f.
Hex is used mostly by programmers, mostly for computing, and mostly in
programming languages that have the digits and Latin letters built-in, and
that's what compilers expect to see. Hex doesn't
,
Rick McGowan
Unicode, Inc.
Jill Ramonsky asked...
What guarantee do I have that other Unicode characters will not be
added in the future which have the property Hex_Digit?
You don't have a guarantee of much in the future, except as indicated in
the Unicode stability policies.
Realistically, however, you're probably
Someone suggested...
It would be much simpler if each such character were clearly labelled in
the code charts etc. DO NOT USE!, and with its glyph presented on a grey
background or in some other way to indicate its special status.
Well, sure, I agree that it might be nice to somewhere
John Cowan remarked...
Of course it's
the *pint* (8 pints to a gallon) that is 16 or 20 fluid ounces.
Which explains to me why a pint of bitter in England seems quite so
enormous... well for a small Yank... ;-)
Rick
Hello all...
We are approaching the feedback deadline for some of the open Public
Review Issues. If you have any interest in sending comments for UTC
consideration, please see the Public Review page:
http://www.unicode.org/review/
for a list of the open issues, and instructions on
this
date. Please submit feedback with the reporting form at:
http://www.unicode.org/reporting.html
Regards,
Rick McGowan
Peter Kirk suggested...
Interesting and a little embarrassing that Unicode's own documentation
is not Unicode compatible!
I don't think it's very embarrassing... The Unicode consortium after all
doesn't produce book editing and typesetting software, we use other
peoples' software.
I think
The beta period for Unicode 4.0.1 has now started. Detailed information is
available on the beta page:
http://www.unicode.org/versions/beta.html
Beta versions of Unicode 4.0.1 data files are now available for public
comment here:
http://www.unicode.org/Public/4.0-Update1/
implement the Standard.
The complete list of available notes is accessible here:
http://www.unicode.org/notes/
Regards,
Rick McGowan
Unicode, Inc.
Jim Cloos asked
(B
(B Or a haiku?
(B
(BAs long as we're off topic... A Haiku. Picking up on your 7 syllables, as
(Bquoted by Ken, how about:
(B
(BUnfortunately
(BTerra is not far behind
(Bthe eight ball of God
(B
(BH... Well, that certainly lacks a seasonal
be aware
that discussion comments on the Unicode mail list are not automatically
recorded as input to the UTC. You must use the reporting link above
to generate comments for UTC consideration.
http://www.unicode.org/consortium/distlist.html
Regards,
Rick McGowan
Unicode, Inc.
Raymond Mercier suggested...
http://wwwold.dkuug.dk/jtc1/sc2/wg2/docs/n2422.pdf
And these 6 Sogdian letters were accepted and do appear in Unicode 4.0.
http://www.gengo.l.u-tokyo.ac.jp/~hkum/pdf/SIE3.pdf
That documnet is apparently in some non-standard encoding and the French
accented
John C asked...
I would like to ask the old farts^W^Wrespected elders of the UTC
which principle they consider more important, abstractly speaking:
the principle that combining marks always follow their base characters
(a typographical principle), or that text is stored, with a few minor
Those interested in Tamazight might also be interested to know there has
been some preliminary work to encode it in Unicode. Copies of the
discussion documents are here:
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n1757.pdf
http://www.unicode.org/~rick/03076-tifinagh-discussion.pdf
Perhaps people
mail list are not automatically recorded as input to the UTC. You
must use the reporting link above to generate UTC consideration.
http://www.unicode.org/consortium/distlist.html
Regards,
Rick McGowan
Unicode, Inc.
Peter Kirk wrote:
And then if (and I know it's a big if) the UTC agrees in principle to
allow a change to these combining classes, [...]
This just isn't going to happen, so people should look elsewhere for
solutions. I don't believe UTC could make such a decision and retain any
sort of
What would be the purpose of encoding these? I can't think of any.
They certainly don't need to be encoded as distinct characters to use
in a Last Resort font.
Mostly for documentation purpose,
Why bother to encode them as distinct characters? For purposes of
documentation isn't a good
Ostermueller, Erik wrote:
At unicode.org, when I click this link,
http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=2
I'm expecting to see a little square GIF that displays U+2.
Instead, I see N/A.
This has now been fixed. Thank you for pointing out the error. The code
was
Ran across a place that has a number of language kits for Mac OS X,
including Burmese, Cherokee, Inuktitut, Kannada, Malayalam, Telugu, and
Tibetan. I haven't seen any blurbs about them anywhere...
http://www.xenotypetech.com
Rick
Can anyone tell me how to convert UTF-8 to UTF-16LE .
Funnily enough that's just what I'm coding right now.
The encodings are described in Chapter 3 or Unicode, UTF-8 is also described
RFC 2279 http://www.ietf.org/rfc/rfc2279.txt and UTF-16 in RFC 2781
http://www.ietf.org/rfc/rfc2781.txt.
Sorry for the intrusion, but... If anyone knows an off-line way to get a
hold of Barry Caplan, could they please give him a call or send him a
letter, or knock on his door? He has been doing some standards archaeology,
and has sent messages to several people in the past few weeks, but his
Andrew C. West asked:
Where can the average proposal author browse section II, Character
Categories (needed for item B.3), clause 14, ISO/IEC 10646-1: 2000
(needed for B.4)
That is section 2.2 of the WG2 Principles and Procedures document. It is
available on-line. Go here:
Philippe wrote:
When I just look at the history of combining classes, they did not exist in
the first Unicode standard, and they still don't exist in ISO10646 as well.
This was a technology developed by IBM and offered for free to the community
Excuse me Philippe, but you are wrong. Please
Wow... How on earth did the subject line Major Defect in Combining
Classes of Tibetan Vowels turn into a discussion of Biblical Hebrew? At
least, people, if you're going to transmogrify the discussion, please use a
subject line such as Biblical Hebrew which someone already was wise
enough
Ken wrote...
I now like better the suggestions of RLM or WJ for this.
I'll have to disagree with Ken. I'm not so sure about either of these. I
don't think anyone has, in the past, considered what conforming or
non-conforming behavior would be for a RLM or WJ between two combining
marks.
Let me remind you: Talk on this list doesn't mean that the issue is
automatically brought up for UTC deliberation. If no documents are formally
submitted, nothing will happen.
After all the discussion of Tibetan, if anyone has a serious concrete
proposal for a specific change to the Unicode
101 - 200 of 374 matches
Mail list logo