On Sun, 17 Mar 2002, Andy Heninger wrote:
Tighten up the definition of an artificially constructed language to
be one that has never had native speakers, and you're there.
According to what I've heard, you have just thrown out both Esperanto and,
believe it or not, Klingon -- linguists do funky
On Sun, 17 Mar 2002, Asmus Freytag wrote:
Like all organizations, neither Unicode nor ISO have infinite resources.
Of course. I actually think both the Unicode Consortium and the ISO are
doing a fine job. The point was, if there was a problem prioritization
could solve, it still wouldn't be the
Markus Scherer wrote:
How about U+10?
It is a non-character, which gives it a high (unassigned
character) weight in the UCA. It is the highest code point =
the last character.
That is definitely not what I was looking for. It is an illegal codepoint,
while I was looking for a legal
Again - 'invalid data' and 'garbage'. Because you're thinking old data with
old definition. How about new data and old software?
Your approach means that if a new character is defined in say ISO 8859-8,
then all old software should report it as error. And all users must upgrade.
When (and if!)
Jungshik Shin wrote:
[Dan Kogai]
Dan the Man whose Name was Compromised by the Japanese
government (*)
(*) My parents wanted me to name me 彈 (U+5F48), a classical
form, but it was not listed on the table of Kanjis allowed
for names so I was named U+5F3E.
Frankly speaking, I
I've changed the Subject: header because this thread is diverging.
On Saturday, March 16, 2002, at 11:43 , Thomas Chan wrote:
This particular case in a Chinese context wouldn't be respected.
One of the strongest taboo in business correspondence in Asia is to
misspell names. (Thanks to |
On Mon, 18 Mar 2002, Dan Kogai wrote:
However,
if one is to pick over little details, then I still don't know what
U+5F3E
is (in the context of Dan's name)--does the upper right corner have two
or
three strokes?
Three. That's the only official 'Dan' with 'Bow' and 'Single'
-Original Message-
From: Dan Kogai [mailto:[EMAIL PROTECTED]]
As Kato pointed out, Unicode is more pro-programmers than
pro-users.
This is true of any character set. Users are not at all concerned with
how their script is stored. Most would prefer to never know about, hear
about,
On Monday, March 18, 2002, at 03:54 AM, Dan Kogai wrote:
In other words, neither Unicode nor any portable charset to date can be
used just to issue driver's license, much less exchange legal documents
electrically. This is a serious obstacle to digitize government but
never discussed
On Sun, 17 Mar 2002, Miikka-Markus Alhonen wrote:
On 17-Mar-02 Curtis Clark wrote:
At 04:45 PM 3/16/02, Doug Ewell wrote:
But right away that definition includes not only Shavian, Tengwar,
Cirth, Klingon, and most of the contents of ConScript, but also
Ethiopic, Cherokee, Canadian
On Fri, 15 Mar 2002, Kenneth Whistler wrote:
Dan Kogai continued:
[snip]
His
favorite appears to be ISO-2022 but as Yet Another Perl Encoding Hacker,
ISO-2022 is pain in the arse
You got that right!
--Ken
Monday, March
Jim Agenbroad asked:
Monday, March 18, 2002
Is ISO 2022 a character set (characters with their codes) or a complex
(painful?) means to announce and negotiate among various sets? I thought
it was the latter; am I missing something?
ISO 2022 is a
Lars Kristan responded:
Markus Scherer wrote:
How about U+10?
It is a non-character, which gives it a high (unassigned
character) weight in the UCA. It is the highest code point =
the last character.
That is definitely not what I was looking for. It is an illegal codepoint,
Doug Ewell recently said:
The closest I can come is something like a script that was invented,
generally by one person and in a relatively short period of time, rather
than evolving from existing scripts in a gradual and progressive
manner.
But right away that definition includes not only
Yes, just get the ruddy things into Unicode. Or else use Plan B:
Plan B is, the only valid characters in a personal name are kana.
_
$B$*E9$h$j$b5$7Z$K!*9%$-$J%b%N9%$-$J$@$18+$i$l$k(B MSN $B%7%g%C%T%s%0(B
I'm currently typesetting a book written in Jarai (var. Jrai, J'rai), a
tribal language used in the highlands of Vietnam; besides using characters
already accounted for in the Vietnamese script, the written Jarai language
uses several characters that are, to my knowledge, unique to it (and a
At 07:11 PM 3/17/02 -0800, Doug Ewell wrote:
The myth I was trying to communicate was that the process is totally
serial, such that if 3 weeks are spent on getting Tai Le encoded, CJK
Extension X is pushed back by 3 weeks.
Stated this way, it's of course overstated. You pointed out the
Jerome Hodges asked:
I'm currently typesetting a book written in Jarai (var. Jrai, J'rai), a
tribal language used in the highlands of Vietnam; besides using characters
already accounted for in the Vietnamese script, the written Jarai language
uses several characters that are, to my
John Jenkins wrote:
Basically, the place where I personally would draw the line is between
having a body of people (size left vague) who want to interchange data in
the script, or if there is a historic body of literature in the script.
I find myself very much in sympathy with this
Hi Jerome,
I'm currently typesetting a book written in Jarai (var. Jrai, J'rai), a
tribal language used in the highlands of Vietnam; besides using characters
already accounted for in the Vietnamese script, the written Jarai language
uses several characters that are, to my knowledge, unique
Lars Kristan suggested:
OK, another way of looking at all this. I believe you would accept three
options:
A - Reject the stream.
B - Drop the invalid data.
If you were defining an application concerned with security, and if
you had a clearly defined conversion you were performing, yes these
And it seems to have overlooked the fact that not all conversions
are defined on multi-byte character encodings to Unicode.
Grr. What I meant of course was:
And it seems to have overlooked the fact that not all conversions
are defined on single-byte character encodings to Unicode.
--Ken
Vladimir Ivanov noted:
Old Persian and Avestan are closely related ancient languages that usually
go side by side. If a linguist refers to an Old Persian example, he must
show its Avestan form or his work would be considered to be incomplete ...
[ lots of good information followed ]
What
Plan B is, the only valid characters in a personal name are kana.
I meant, *in Japan*.
Stefan
_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com
John Jenkins wrote:
Basically, the place where I personally would draw the line is between
having a body of people (size left vague) who want to interchange data
in
the script, or if there is a historic body of literature in the script.
So the script of the Codex Seraphinianus would NOT
Timothy Partridge [EMAIL PROTECTED] wrote:
If I went to a community whose language doesn't have a written form
and
convinced them that Tengwar would be an ideal way of recording their
culture, would that make Tengwar more legitimate? Or cause people to
regard
it as a higher priority?
Yes.
Kenneth Whistler [EMAIL PROTECTED] wrote:
The proposals will likely languish until Michael Everson discovers
he has some free time on his hands to pursue consensus with
academic Iranianists and other interested parties, or until
someone from that community emerges as a champion to push the
Kenneth Whistler [EMAIL PROTECTED] wrote:
b-stroke: 0180 ~ 0180 id.
...
(all in both lower- and upper-case variants).
Substitute out the uppercase for the relevant base characters, and
you have it.
There's a problem, though. There is no uppercase form of U+0180
Sorry for the belated response to this. I hope it is still relevant.
Patrick T. Rourke [EMAIL PROTECTED] wrote:
I would think you could simply use the version number of the Unicode
Standard. For example, the use of Tagalog would have been conformant
to
this proposed PUA registry until
On Mon, Mar 18, 2002 at 08:59:15PM -0800, Doug Ewell wrote:
You are not going to find many fonts on the Web that contain PUA
characters.
There are a few Shavian fonts using the ConScript PUA encoding.
--
David Starner - [EMAIL PROTECTED]
It's not a habit; it's cool; I feel alive.
If you
Hello, Unicoders!!
About the transliteration of the Hebrew letters for the Ladino (Judeo-Spanish, uemo) language, an acceptable system for that is one used by Padre (=Father) Pascal Recuero (which looks Esperanto-like, as can be seen just below):
alef' (apostrophe)
beth-dagheshb
31 matches
Mail list logo