OFF-TOPIC character set usage statistics ???

2001-08-01 Thread John Wilcock
I seem to remember that someone recently posted a link to some statistics on character set usage, but I can't seem to find it in my old messages. Can anyone help? John. -- -- Over 1500 webcams from ski resorts around the world - http://www.snoweye.com/ -- Translate your technical documents

RE: Errata in language/script list: xUSSR languages

2001-08-01 Thread John Hudson
At 09:05 7/31/2001 -0500, Hohberger, Clive wrote: Tundra Nenets, together with Forest Nenets, forms the Nenets group of languages, which belongs to the Samoyed branch of the Finno-Ugrian (Uralic) language family. Nenets was formerly known as Yurak or Yurak Samoyed, both now obsolete. Last year,

Re: Errata in language/script list: xUSSR languages

2001-08-01 Thread Kairat A. Rakhim
James Kass wrote, Kairat A. Rakhim wrote, I have notes about languages of former USSR included in the list. In 1930th almost all of them have been written in Latin script known as 'Unified New Turkic Alphabet',.or in its derivatives (Common Northern Alphabet etc). It should be

Transliteration

2001-08-01 Thread Mark Davis
On http://www.macchiato.com/slides/transliteration_in_icu.ppt, I have slides for my conference talk on transliteration. For those people having an interest in transliteration, I would appreciate any feedback. Mark P.S. The slides are in PowerPoint. If someone is interested and can only

Arithmetic in Transliteration

2001-08-01 Thread $B$F$s$I$&$j$e$&$8(B
From what I have read on this list, a Roman-to-Hangul translation would be GREATLY aided by the use of arithmetic on Unicode values. Is this in there too? Arithmetic on Unicode values? rubyrb$B$8$e$&$$$C$A$c$s(B/rbrp(/rprtJuuitchan/rtrp)/rp/ruby Well, I guess what you say is true, I could

Unicode/font questions.

2001-08-01 Thread Richard, Francois M
Since Win2000 and NT are native Unicode, is it true to say that any use of a non-Unicode font (in fact most of the fonts on Windows. And in particular Asian font like MS Mincho, MS Gothic) in a Unicode application will generate a conversion WideCharToMultibyte (to convert the Unicode text to the

Re: Errata in language/script list: xUSSR languages

2001-08-01 Thread Peter_Constable
Peter Constable thought maybe a couple and you illustrate no additional characters required. I'll split the difference and say one. With the lower case... it's a couple, isn't it? I meant the upper / lower of what I think Marco proposed as 413+321, but I'm not sure these should be

RE: Transliteration

2001-08-01 Thread $B$F$s$I$&$j$e$&$8(B
I just saw the slides. That cursor-backup looks very tricky. So for someone doing the kana-to-Hepburn, you might have this: (here, "o^" means o-with-circumflex) (bakayarou disclaimer: I make lots of errors) $B$3"*(Bk|$B$*(B $B$="*(Bs|$B$*(B $B$H"*(Bt|$B$*(B . $B$*$&"*(Bo^

RE: Re[2]: Errata in language/script list

2001-08-01 Thread Ayers, Mike
From: Marco Cimarosti [mailto:[EMAIL PROTECTED]] This is not correct: I have found the term Han or hanzi in any kind of literature, not only on Unicode documentation. Hanzi is a loan word which I have also often seen (usually written in italics as it should be), but I never said

RE: Unicode/font questions.

2001-08-01 Thread Murray Sargent
Actually fonts on Windows are normally Unicode based (including MS Mincho and MS Gothic) and most have in addition some codepage access. So there is neither a perf hit nor a codepage problem in using such fonts on NT, Win2000 and WinXP. These considerations are orthogonal to OpenType. Murray

RE: Arithmetic in Transliteration

2001-08-01 Thread Marco Cimarosti
11 wrote: From what I have read on this list, a Roman-to-Hangul translation would be GREATLY aided by the use of arithmetic on Unicode values. Is this in there too? Arithmetic on Unicode values? I guess you mean this: http://www.unicode.org/unicode/reports/tr15/#Hangul _ Marco

RE: Re[2]: Errata in language/script list

2001-08-01 Thread Thomas Chan
On Wed, 1 Aug 2001, Ayers, Mike wrote: From: Marco Cimarosti [mailto:[EMAIL PROTECTED]] BTW, I notice that a single Chinese entry is listed. This should probably be split in several entries for the various Chinese languages (or dialects, e.g. Mandarin, Cantonese, Hakka, etc.). This

Re: Arithmetic in Transliteration

2001-08-01 Thread Mark Davis
We have specialized transliterators that are algorithmic. See http://oss.software.ibm.com/icu/apiref/class_Transliterator.html For the specific case of Hangul, what we have is an algorithmic Hangul-Jamo converter, and a rule-based Jamo-Latin converter. The Hangul-Latin transliterator internally

RE: Unicode/font questions.

2001-08-01 Thread Chris Pratley
In fact both MS Mincho and MS Gothic contain far more characters (and glyphs) than appear in JIS X 0208 and a few more than JIS X 0212, so these already go far beyond code page based (code page 932 covers essentially JIS X 0208). In fact much Microsoft software no longer officially supports

Re: Transliteration

2001-08-01 Thread Mark Davis
Yes, you could use backup in that way, if you wanted. In that case, though, it doesn't buy you much. Where it is more useful is for the kyo, gyo,... case. For those not familiar with Japanese, there are a large number of cases that follow the same pattern: kyo maps to a large katakana for

Re: Unicode/font questions.

2001-08-01 Thread Peter_Constable
Since Win2000 and NT are native Unicode, is it true to say that any use of a non-Unicode font (in fact most of the fonts on Windows. And in particular Asian font like MS Mincho, MS Gothic Your question has a mistaken premise: the vast majority of TTF fonts on Windows *do* have Unicode cmaps.

FW: Unicode in Asia Question

2001-08-01 Thread Magda Danish (Unicode)
-Original Message- From: NORIEGA,DANNY (A-HongKong,ex1) [mailto:[EMAIL PROTECTED]] Sent: Wednesday, August 01, 2001 2:56 AM To: '[EMAIL PROTECTED]' Subject: Unicode in Asia Question Hi: My company is planning to implement 16-bit Unicode. The proposal is to go strictly and solely

Re: Unicode/font questions.

2001-08-01 Thread Jungshik Shin
On Wed, 1 Aug 2001, Richard, Francois M wrote: Since Win2000 and NT are native Unicode, is it true to say that any use of a non-Unicode font (in fact most of the fonts on Windows. And in particular Asian font like MS Mincho, MS Gothic) in a Unicode application will generate a conversion

RE: Transliteration

2001-08-01 Thread Marco Cimarosti
Mark Davis wrote: Yes, you could use backup in that way, if you wanted. In that case, though, it doesn't buy you much. Where it is more useful is for the kyo, gyo,... case. Another case on the top of my mind is the transliteration (or whatever it is) of Indic scripts. This mechanism may

RE: Unicode in Asia Question

2001-08-01 Thread Addison Phillips [wM]
Hi Danny, Implementing Unicode is a good thing for creating multilingual applications and for supporting code that is distributed worldwide (or at least to a number of locales). Based on your questions below, you probably should start with the Unicode FAQ (on the website) and with the standard

RE: Unicode in Asia Question

2001-08-01 Thread Carl W. Brown
Danny, I am currently working on xIUA. This is sample code that you can integrate into you application to help you interface to ICU. http://oss.software.ibm.com/icu/ It has added functionality specifically tailored for Web servers. It will allow you to develop application Unicode that will

Re: Unicode in Asia Question

2001-08-01 Thread Markus Scherer
You can go and try using a web server that works internally in 16-bit Unicode (UTF-16) and serves web pages in many languages in either UTF-8 (default) or many other codepages. (Now that ICU was mentioned already...) Go to

FW: Codepage

2001-08-01 Thread Magda Danish (Unicode)
Title: Message Unicoders, I have difficulty understanding this person's request but would like to help. Can you? Thanks.Magda. -Original Message-From: korlvinke [mailto:[EMAIL PROTECTED]] Sent: Wednesday, August 01, 2001 1:59 PMTo: Magda Danish (Unicode)Subject: Re: Codepage I

Re: Codepage

2001-08-01 Thread Michael \(michka\) Kaplan
Microsoft's Euro story can be seen at: http://www.microsoft.com/europe/euro/ Specifically, the Windows info is at http://microsoft.com/windows/euro.asp There is no way to arbitarily add code points to a Windows code page, though. Either you have the patch or the newest file, or you do not. If

Transliteration Slides in HTML

2001-08-01 Thread Mark Davis
I put an HTML version on http://www.macchiato.com/slides/transliteration_in_icu.htm. BTW, I notice that a link on slide 21 is wrong. The text is ok, but the internal link is wrong: should be http://oss.software.ibm.com/icu/demo Mark — πάντων μέτρον ἄνθρωπος — Πρωταγόρας