From: Asmus Freytag [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, October 09, 2001 01:02 PM
At 01:43 PM 10/9/01 -0400, Gary P. Grosso wrote:
Because of Unicode's Han unification, I was under the impression that
to get both Traditional Chinese and Simplified Chinese to really look
right
From: Sampo Syreeni [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, October 03, 2001 02:50 AM
On Wed, 3 Oct 2001, Marco Cimarosti wrote:
Alef Fatha Lam Sukun Qaf Fatha Alef Ain Kasra Dal Fatha
Teh-Marbuta
(Damma)
It strikes me as weird that none of the major news media have
gone
From: John Cowan [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, October 03, 2001 09:14 AM
Ayers, Mike wrote:
I also recall when the U.S. government decided to switch from
Wade-Giles to Pinyin romanization of Chinese and muscled
the media into
playing along. All that confusion
From: G. Adam Stanislav [mailto:[EMAIL PROTECTED]]
Sent: Monday, October 01, 2001 12:07 PM
Send him a check instead. Every single US check I have ever seen had
a dollar sign printed to the left of the field where the
numeric amount
is to be entered. They all use the same glyph
From: Edward Cherlin [mailto:[EMAIL PROTECTED]]
Sent: Saturday, September 29, 2001 05:55 PM
If we omit the later use of subtractive notation (iv=4, xc=90
etc.), the original Roman numerals are exactly equivalent to
the Chinese abacus where each wire holds four beads below the
bar
$B:9=P?M(J: Kenneth Whistler [EMAIL PROTECTED];
$BF|;~(J: 01/09/26 2:23
Go man!
Actually, if he's half Jamaican, I think you have to say "Go mon",
which is also the Japanese for 50,000, yes?
/|/|ike
From: Kenneth Whistler [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, September 26, 2001 11:34 AM
Actually, if he's half Jamaican, I think you have to
say Go mon,
which is also the Japanese for 50,000, yes?
No, actually, it is Japanese for 5th question, although
that seems to be
From: Asmus Freytag [mailto:[EMAIL PROTECTED]]
Sent: Sunday, September 23, 2001 02:24 AM
The typical situation involves cases where large data sets
are cached in
memory, for immediate access. Going to UTF-32 reduces the
cache effectively
by a factor of two, with no comparable
If you think you have the answer to all the problems, then you
don't know all the problems.
I tried to make a point, and apparently made it poorly. I will try
again. It seems that some people are arguing that UTF-16 is the ideal
solution for all computing, and that UTF-8 and
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: Thursday, September 20, 2001 12:10 PM
Why not have as part of your kanji collation order, the Han
digits one through nine, in that order?
I believe that would be because they are not ordinarily sorted that
way.
Why are
From: John Cowan [mailto:[EMAIL PROTECTED]]
[EMAIL PROTECTED] scripsit:
Oops! One of two Unicode 101 mistakes I made in the same
day. Where was
my brain?
Unicode Ate Your Brain, of course! (See my tutorial at
Orlando this year.)
Nah, UTF ate it!
From: Marcin 'Qrczak' Kowalczyk [mailto:[EMAIL PROTECTED]]
Sent: Friday, September 14, 2001 02:11 AM
Thu, 13 Sep 2001 12:52:04 -0700, Asmus Freytag
[EMAIL PROTECTED] pisze:
UTF-32 does have the same byte order issues as UTF-16, except that
byte order is recognizable without a BOM.
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: Monday, September 10, 2001 10:40 AM
The trouble with algorithms for sorting *text* is that often
an algorithm that prurportedly sorts TEXT will really be
sorting at least partly by PRONUNCIATION. So is it really
sorting text?
From: J M Sykes [mailto:[EMAIL PROTECTED]]
Sent: Friday, September 07, 2001 07:50 AM
The classic example is 'resume' and 'résumé'. These are, by
now, two quite
distinct words, and the fact that there is no 'established'
order is shown
I spell both resume and have never been
From: David Gallardo [mailto:[EMAIL PROTECTED]]
Sent: Friday, September 07, 2001 10:07 AM
As a practical matter, you need to take the diacritics into
account when
sorting, even in English where they (may or may not) have linguistic
significance, otherwise you'll get nondeterministic
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: Thursday, August 30, 2001 06:06 AM
IMO, I correctly replied to Viranga's question and I've
no idea what you're talking about below.
Let me try to put it another way. What you said may have been
technically correct, but it
I have no idea what kind of stunt you're trying to pull.
/|/|ike
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: Thursday, August 30, 2001 08:37 AM
I have no idea of what you're talking about.
Misha
On 30/08/2001 16:11:14 Ayers, Mike wrote:
From: [EMAIL
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: Thursday, August 30, 2001 08:36 AM
Furthermore, Viranga's context appears to be XML, in which
case it *is* possible to encode *all* Unicode code points
using EUC (or ISO-8859-1 or ASCII or ...)
I ask again - where's the
From: Addison Phillips [wM] [mailto:[EMAIL PROTECTED]]
Sent: Thursday, August 30, 2001 09:51 AM
4. However, you can use any other encoding, provided you tag the file
appropriately (so that the parser knows what the encoding is and can
translate it to its internal representation).
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: Thursday, August 30, 2001 10:42 AM
Interesting. My original reply is pasted in below. Please
tell me how you managed to arrive at your interpretation.
As I mentioned already, I misread your original reply, partially
From: Philipp Reichmuth [mailto:[EMAIL PROTECTED]]
This is not quite true. In fact, in the academic community (or at
least in linguistics cultural sciences) it is established practice
to transliterate some terms. In a sinologist's work on Mao he'll
probably write Mao as Mao instead of
From: Thomas Chan [mailto:[EMAIL PROTECTED]]
e.g., If someone asked 1-2 (pre-Unicode 3.1) years ago the
question, Can
I write Cantonese with Unicode?, the answer would have been
no or not
really. If it were asked today, the answer would be yes.
But try that
question today with
From: Marco Cimarosti [mailto:[EMAIL PROTECTED]]
This is not correct: I have found the term Han or hanzi
in any kind of
literature, not only on Unicode documentation.
Hanzi is a loan word which I have also often seen (usually written
in italics as it should be), but I never said
From: Philipp Reichmuth [mailto:[EMAIL PROTECTED]]
On a side note of course it would by now probably make sense
to add Latin as alphabet to Chinese as well since hanyu pinyin has
been adopted as some sort of official latinization system by the
Chinese government, but that's an entirely
From: Kenneth Whistler [mailto:[EMAIL PROTECTED]]
Also, I see that the script for Chinese is listed as Han, not
Chinese. Must we insist on confusing people?
The script in question is designated Han in the Unicode Standard,
and has always been so, in part because it is also used
From: Tex Texin [mailto:[EMAIL PROTECTED]]
So it must not be an NCR, EXCEPT in the seemingly rare case where
the string ]] appears in content AND that string is not being
used to indicate the end of a CDATA section.
How is that supposed to be read?
Simple. Since ]] is used to
From: John Cowan [mailto:[EMAIL PROTECTED]]
I think that any proposal to shrink the range of well-formed documents
is simply a nonstarter, regrettable as that is.
I had thought that one of the main goals of XML Blueberry was
mainframe compatibility. If so, won't they need to
From: Shigemichi Yazawa [mailto:[EMAIL PROTECTED]]
XML states Its goal is to enable generic SGML to be served, received,
and processed on the Web in the way that is now possible with HTML.
But, in my opinion, XML has outgrown its original goal way too
far. XML seems to be used in every
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
In a message dated 2001-07-13 5:27:41 Pacific Daylight Time,
[EMAIL PROTECTED]
writes:
@š‚¶‚イ‚¢‚Á‚¿‚á‚ñš
@Ž„‚͂낱‚¦‚ñ‚ç‚©‚ׂ³B
Robert, please stop this. It doesn't seem to be UTF-8 (that
is, I can't copy
and
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Raw UTF-8 4,382,592
Zipped UTF-82,264,152 (52% of raw UTF-8)
Raw SCSU1,179,688 (27% of raw UTF-8)
Zipped SCSU 104,316 (9% of raw SCSU, 5% of zipped UTF-8)
The data set is truly
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Those are MOJIBAKE for my SIG.
Which is what you deserve for not sending UTF-8. Until you upgrade
your mailer, your name wil be @š‚¶‚イ‚¢‚Á‚¿‚á‚ñš.
:-p
1) I think that is mojibake for my name. It looks familiar.
From: Jungshik Shin [mailto:[EMAIL PROTECTED]]
Mysterious is why this prompting (by MS OE) did not happen to Mike
Ayers when he replied to Peter's message with Thai string in
Windows-874
adding some Chinese characters while MS OE (5.50.x) I
tried certainly
prompted me to pick
From: Chris Wendt [mailto:[EMAIL PROTECTED]]
Replying in the charset of the original message is in my view
reasonable
behavior: the recipient of your reply has the best chance to read the
message in the encoding the original message was sent. Changing the
encoding decreases the chance
From: Mark Davis [mailto:[EMAIL PROTECTED]]
Yes, that works fine. The Thai comes through clearly: ¡ÅÑ»ÁÒÍÂÙèáÅéÇ
Woohoo!!! UTF-8 party!!! ???!!!
/|/|ike
:
From: Ayers, Mike [mailto:[EMAIL PROTECTED]]
Let's try this again...
From: Mark Davis [mailto:[EMAIL PROTECTED]]
Yes, that works fine. The Thai comes through clearly:
¡ÅÑ»ÁÒÍÂÙèáÅéÇ
Woohoo!!! UTF-8 party!!! ???!!!
/|/|ike
Let's try this again...
From: Mark Davis [mailto:[EMAIL PROTECTED]]
Yes, that works fine. The Thai comes through clearly: ¡ÅÑ»ÁÒÍÂÙèáÅéÇ
Woohoo!!! UTF-8 party!!! ???!!!
/|/|ike
From: Jungshik Shin [mailto:[EMAIL PROTECTED]]
Nothing cryptic. As with others on this thread, your problem is
to mistake Windows-874 (legacy encoding for Thai) for UTF-8. Because
Windows-874 does NOT cover Chinese characters, they turned into
'?'. Judging from your message hader,
From: Edward Cherlin [mailto:[EMAIL PROTECTED]]
The 'tsu' sign in reduced form is traditionally used in Japanese for
consonant doubling (chyotto is written chi yo tsu to), but
has been adapted
for glottal stops at the end of words.
Odd. I've always considered Japanese double
From: James Kass [mailto:[EMAIL PROTECTED]]
てんどうりゅうじ wrote:
Still haven't got the multiplication riddle solved, Mr. Kass?
Sorry, I didn't know it was required. Almost asked 'which
riddle?', but now notice the × in the signature portion as
follows...
らんま
From: Martin Duerst [mailto:[EMAIL PROTECTED]]
For people interested in new scripts, and new uses
of existing scripts :-)
http://www.google.com/intl/xx-hacker/
This looks like what is called L33T (elite) writing. It's popular
among online gamers. Kinda like computer pig latin...
From: Thomas Chan [mailto:[EMAIL PROTECTED]]
On Mon, 2 Jul 2001, Ayers, Mike wrote:
/|/|ike
The way you sign your messages is related to that, isn't it?
:) I've seen
]\/[, too.
Only related in spirit. I typed some slashes and bars together once
(I forget why - maybe
From: James [mailto:[EMAIL PROTECTED]]
There's already 2 Perl modules on CPAN that implement
ACE. These modules are already in use by ISPs for CJKV
iDNS registration. (One was packaged by me based on Paul Hoffman's
IMC code.) They are based on draft-ietf-idn-race-02.txt
So it seems
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
The defintions have problems that need to be fixed, though,
and they're
less clear for UTF-16 than they are for UTF-8. I'm becoming
inclined to say
that any argumentation for or against UTF-8s on the basis of
whether it
runs into
From: Jianping Yang [mailto:[EMAIL PROTECTED]]
This will fix the following problem for example:
For a searching engine to search the character U-0001 in
UTF-8 string, and it
could not find. But when UTF-8 is converted into UTF-16, it
can found it there
because ED A0 80 and ED B0
I sense impending laughter!
Let's get this straight:
This is a claim that Unicode cannot navigate its way through the
political sensitivities of the East Asian peoples. It is coming from
someone who refers to those peoples as Orientals.
I quote: Unicode
From: Elliotte Rusty Harold [mailto:[EMAIL PROTECTED]]
At 4:15 PM -0500 6/4/01, Ayers, Mike wrote:
I have used Arabic numerals all my life without once
thinking that I
was writing Arabic.
Really? I myself have been writing European numerals using the
Arabic-Indic place-value
Perhaps if Han is too unfamiliar a word to be used
directly, Sino or
Sinitic could be used as translations to convey the same
meaning without
using the overloaded term Chinese (language, culture,
origin, ethnicity,
nationality, etc), e.g., Sino characters, Sinitic characters.
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
For the Han characters, I have found in the past that people
whose native
language does not use these characters usually refer to them
as Chinese.
Obviously (to us anyway), calling them Chinese characters
is not adequate,
so we
However, some other people somewhere else may not like that
even though
'Hanzi/Kanji/Hanja' are just different ways of pronouncing
the identical
words written in 'Chinese characters' meaning 'Chinese characters'.
Let's not work based on imaginary fears. Unless someone can name
From: Thomas Chan [mailto:[EMAIL PROTECTED]]
I think the problem that Doug might be suggesting (correct me if I'm
wrong, Doug) is that Chinese is also the name of a language(s). The
I have used Arabic numerals all my life without once thinking that I
was writing Arabic. Doug
If you have this funny encoding please don't call it UTF8 because it is not
UTF8 and will only confuse users. You could call it OTF8 or something like
that but not UTF8.
How about WTF-8?
Sorry - I couldn't resist.
/|/|ike
From: Carl W. Brown [mailto:[EMAIL PROTECTED]]
I resisted calling it FTF-8 (Funky Transfer Format - 8), but
if you want to
call it Weird Transfer Format - 8, I don't have any real objections.
Well, that's ONE possible translation of WTF...
/|/|ike
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
According to the proposal, UTF-8S and UTF-32S would not have the same
status: they wouldn't be for interchange; they'd just be for
representation
internal to a given system, like UTF-EBCDIC (which, I think I
heard, has
not actually
From: Marco Cimarosti [mailto:[EMAIL PROTECTED]]
Doug Ewell wrote:
Peter has an excellent solution -- much better than trying to
explain the
term CJK to ordinary people -- and I plan to use the term
East Asian in the future.
But, if by East Asian you mean languages written with
From: Herman Ranes [mailto:[EMAIL PROTECTED]]
Unfortunately, there are some errors in the UNHCRC 300
language collection.
Also not wanting to fan any fires, I wish to point out why I believe
the text from Genesis was chosen - most Bible translations (as far as I
know) are worked on
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
We want to be able to tell our characters apart.
I don't - I just want to be able to read them.
Oh by the
way how do you tell LATIN CAPITAL LETTER P from GREEK CAPITAL
LETTER RHO? Sure if you have context or if somebody
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Well, you want serifs. I, l, 1.
Those aren't differentiated by serifs - the differentiators are part
of the character. In any case, when I encounter a difficult bit like this,
I intend to do exactly what I did just now - look at it
From: Carl W. Brown [mailto:[EMAIL PROTECTED]]
I find that the most compelling reason is that many
characters should be
rendered differently depending on user preferences. For
example a Japanese
user should have the han rendered into Japanese characters
except for one
that do not
From: Carl W. Brown [mailto:[EMAIL PROTECTED]]
For those who do not know enough to tell the difference between
Kanji typography and Hanzi typography (and Hanja typography ;-) this
yields
no benefit and forces a meaningless choice (which script
that you can't
read
do you
From: Marco Cimarosti [mailto:[EMAIL PROTECTED]]
I wanted to forward it to these mailing lists, but the NYT
copyright notice
is quite clear in that articles can only be downloaded for
private use.
Hmmm - the NYT is based in the United States, where copyright laws
have an
From: Keld Jørn Simonsen [mailto:[EMAIL PROTECTED]]
Is there then anyplace I can get a peek at the Extension B
characters?
gibbeligobble gobbeligoble gibberish jest to keep sarasvasti
happy that there is something new in this message.
I think I need to write a few more
Bad web day...
From: Thomas Chan [mailto:[EMAIL PROTECTED]]
http://deall.ohio-state.edu/grads/chan.200/misc/xin_tangshu-76.3481.jpg
I believe the correct address (this is probably line-split) is:
From: John H. Jenkins [mailto:[EMAIL PROTECTED]]
Unfortunately, it isn't available yet. Unicode doesn't have a Plane
2 font, although we're actively working to get one.
Is there then anyplace I can get a peek at the Extension B
characters?
TiA,
/|/|ike
From: William Overington [mailto:[EMAIL PROTECTED]]
Can there be found a possible usage that such a scheme would
not support?
Finding just one would resolve the question.
I suspect that the whole issue is covered by Goedel's(sp?)
Incompleteness theorem, which says (approximately)
Long after upgrading to Win2K, setting up all my fonts, and testing
everything, I've come to a conclusion: there are darn few Unicode text
messages on the Unicode mail list (i.e. characters are referred to by
codepoint, but the character itself is never included). In fact, I think
From: Eric Muller [mailto:[EMAIL PROTECTED]]
Ayers, Mike wrote:
Currently, when sending email or
interpreting HTML, the content is tagged for its encoding.
Wouldn't PUA
users simply use their own tag (say, PUA-mike-1) instead of
UTF-8? Am I
missing something?
What we
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
A good point. A possible workaround would be a new plane-14
tag character.
I don't see this as a good solution. This is not because of any
objection to the plane 14 characters, but because I think the problem can be
handled well
From: David Starner [mailto:[EMAIL PROTECTED]]
Which, to the extent which this is true (show me how you plan to
handle The Art of Computer Programming or the Dragon book, for
example), is equally true of upper case. Capitalizing sentences is
redundant with punctuation, and any additional
From: David Starner [mailto:[EMAIL PROTECTED]]
THEN WHY WASTE A WHOLE BIT ON UPPER CASE? THEY CERTAINLY ARE NOT
NECCESSARY AND I HAVE FREQUENTLY SEEN PEOPLE NOT USE THEM WHEN
AVAILABLE.
Good point. We didn't need 'em to get "Huckleberry Finn", so how
necessary can they be?
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
21 = 3 * 7
so could you "flatten" it to 7-bit ASCII?
Well, such flattening may cause the content to be misinterpreted.
However, if you are trying to get Unicode past some really old mailers, this
would be a reasonably efficient way
From: Edward Cherlin [mailto:[EMAIL PROTECTED]]
At 2:04 PM -0500 4/17/01, Ayers, Mike wrote:
From: Edward Cherlin [mailto:[EMAIL PROTECTED]]
One of the strongest benefits of Unicode is that it supports adequate
*monolingual* computing for the first time in any language
From: Edward Cherlin [mailto:[EMAIL PROTECTED]]
I would like to point out, again, that there is not now, and cannot
be, an 8-bit code page adequate to English, and the same is
necessarily true for every other language in modern use. More than a
century of typewriters and computers has
[EMAIL PROTECTED]
I hope that the claim of "multiple UTF-8 representations"
does indeed refer
to glyphs, in the sense that Unicode contains both
precomposed characters and
separable elements, halfwidth and fullwidth ASCII variants,
etc. I hope it
does *not* refer to the nonconformant
From: Martin Duerst [mailto:[EMAIL PROTECTED]]
At 10:00 01/04/09 -0700, Carl W. Brown wrote:
I am wondering how in the absence of a sub language how one
should render
Chinese ruby. Mandarin ruby will not do a Cantonese reader
much good. Can
I specify multiple ruby and then have one
Because of this very sentence, I have tested the Who command, and
guess what? On Fri, 23 Mar 2001 08:45:18 -0500 (EST), Listar
[EMAIL PROTECTED] happily sent me a list of 686 subscribers
to the Unicode list.
Paranoid, I just tried the same and got:
SNIP
List context changed to
From: John Wilcock [mailto:[EMAIL PROTECTED]]
While I'm at it, let me add another plea in favour of setting the
Reply-to: header to point back to the list [*only on messages which
lack this header*, allowing those who wish to receive personal replies
to set the header accordingly].
From: Gaute B Strokkenes [mailto:[EMAIL PROTECTED]]
On Thu, 22 Mar 2001, [EMAIL PROTECTED] wrote:
Your message has been rejected because it appears to quote
too extensively from other posts.
Since when has overquoting been a problem on this list? Does this
mean that
From: Roozbeh Pournader [mailto:[EMAIL PROTECTED]]
On Fri, 23 Mar 2001, Sean O Seaghdha wrote:
Please, please, please, can we not use this stupid
[unicode] addition to the
subject line. I agree with all the points that have been
made against it so
far. It's redundant, it
From: Michael Everson [mailto:[EMAIL PROTECTED]]
How much does a radical weigh?
I check in at about 200lb.
/|/|ike
From: Marco Cimarosti [mailto:[EMAIL PROTECTED]]
Well, one wonders: could that president's madness possibly hide some
ingenuity?
/SNIP
Not really, I suspect. Unlike the situations you describe, American
foreign language training seems by and large to have the focus that
From: Misha Wolf [mailto:[EMAIL PROTECTED]]
What I want to know is whether we get a Unicode ring to wear.
Misha
Only if you join the UniClub(tm)! Kids who join the UniClub get a
secret decoder Unicode webring, Cima's magic pocket encoder (handy for
encoding magic pockets),
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
On 03/09/2001 12:53:57 PM "Ayers, Mike" wrote:
Um... no. The UTF-32 CES can handle much more than the current
space of the Unicode CCS. As far as I can tell, it's good
to go until we
need more than 32 bits to represe
If you really want to finish the job, there's always UTF-32, which
should do rather nicely until we meet the space aliens aith the
4,293,853,186 character alphabet!
/|/|ike
P.S. No, they're not Klingons!
From: Ienup Sung [mailto:[EMAIL PROTECTED]]
I think we shouldn't advocate
From: Frank da Cruz [mailto:[EMAIL PROTECTED]]
Just to be sure: ISO 2022 has two modes, 7 bits and 8 bits,
hasn't it?
And in 7 bit mode (I know it's obsolescent), then C1
controls are not
supposed to be interpreted as controls, are they?
Nor as graphics.
Clarification: If
From: Michael Everson [mailto:[EMAIL PROTECTED]]
Oh, we've got a *proposal* for Klingon. It does not, however, appear
that it meets the criteria for use as well as Tengwar and Cirth.
Okay, I've finally gotta ask: what are Tengwar and Cirth? Klingon
I've heard of (and wish I
I advocate taking it one step farther, and referring to Unicode as
"21 bits and counting". Sure, it should be a long long time before more
space is needed, but it's a good idea to prepare the audience now. After
all, pretty much every ceiling ever established in computing has been
From: David Starner [mailto:[EMAIL PROTECTED]]
The second example I would like to raise are the "Square
Words" or "New
English Calligraphy"[6] (I don't know which name is more
appropriate,
but I will refer to it hereafter as "NEC"), which is a
Sinoform script.
NEC is a system where
From: John Cowan [mailto:[EMAIL PROTECTED]]
Ayers, Mike wrote:
After
all, pretty much every ceiling ever established in
computing has been broken
through, and there is no reason to believe that it won't
happen again!
On the contrary. There *are* reasons to believe that it won't
From: Marco Cimarosti [mailto:[EMAIL PROTECTED]]
This also casts some light on the fact that some fonts
(notably JIS fonts)
have a big black box glyphs at position 0x7F: it is probably
for overwriting
a character already printed on paper, so that it cannot be
read anymore.
I am looking for a tutorial or introduction to Kang Jie typing.
Kang Jie, sometimes called Chang Jie, as well as some other transliterations
(none of which I('m quite sure that I'm spelling correctly, as I don't have
a reference handy), is a language and dialect independent method for
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Okay. Get out your copy of the lyrics to the Ranma
1/2 Complete Vocal Collection Vol. 1. Now look at
the lyrics to Ranbada Ranma (that's Track 12) and
tell me that the long vowel mark is not used with
hiragana.
The long vowel
From: Rick McGowan [mailto:[EMAIL PROTECTED]]
Mike Ayers wrote:
The last I knew,
computer-savvy Taiwan and Hong Kong were continuing to invent new
characters. In the end, the onus is on the computer to
support the user.
Yes, the computer should support the user, but... The
From: D.V. Henkel-Wallace [mailto:[EMAIL PROTECTED]]
At 06:30 2000-11-14 -0800, Marco Cimarosti wrote:
But my point was: not even Mr. Ethnologue himself knows
exactly *which*
combinations are meaningful, in all orthographic system.
And, clearly, no
one can figure out which combinations
From: James E. Agenbroad [mailto:[EMAIL PROTECTED]]
Tuesday,
October 31, 2000
You probably should check out what's done in India. The call hundred
thousands "crores" and have a name I don't recall for tens
of millions.
I don't recall how
From: Shawn Halwes [mailto:[EMAIL PROTECTED]]
Can Japanese be effectively represented with only the
Hiragana, and Katakana
scripts?
"Effectively"? No. Katakana-only writing is just wrong, and
hiragana sans kanji (with or without katakana) is considered children's
writing. From
From: Carl W. Brown [mailto:[EMAIL PROTECTED]]
It seems that the proper solution is to use ISO 15924 which
is part of the
new RCF-1766 sublanguage specifications. However to my
amazment that do not
have separate script designations for traditional and
simplified scripts.
Isn't there a more appropriate forum for the localization issues? I
might even subscribe. However, let's please move the topic to a more
appropriate place and let character encoding issues comprise at least half
the traffic around here.
Thanks,
/"\
With English, the problem with spell checking is quite
different, and different
lists of words would not be as easy for a solution: the en-US
vs. en-GB
tagging does not seem to adequately cover the various
differences such as
-ise vs. -ize, -our vs. -or, -re vs. -er, use of shall vs.
From: Arnt Gulbrandsen [mailto:[EMAIL PROTECTED]]
Are there valid reasons why the imperfect but comprehensive
needs to be a
standard? I can see one reason for it _not_ to be a standard:
A list can
be added to faster, so it's easier for a list to be truly
comprehensive.
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
What is confusing is that sometimes "surrogates" refer to
certain code units (for UTF-16) that are reserved as code points,
and sometimes "surrogates" is used to refer to 'characters
on planes 01-10'. I think the latter is a misuse.
1 - 100 of 106 matches
Mail list logo