RE: CJK question
Have you tried the MS-Office Proofing Tools? It contains a font for GB 18030 characters as well as an updated version of the MS IME. The install program forces you to have MS-Office XP installed. However, I've seen comments on the web suggesting that you can use it without XP. --Erik Ostermueller -Original Message-From: Allen Haaheim [mailto:[EMAIL PROTECTED]Sent: Friday, March 21, 2003 1:48 PMTo: [EMAIL PROTECTED]Subject: CJK question Hello, First my apologies if I have missed something already available on the Unicode website that I should have already known, as well as for my total lack of expertise in the fields commonly under discussion in this forum. If anyone knows of a more appropriate place towards which I should direct my woes, I would be glad to hear of it. I work with Classical Chinese texts which contain numerous characters not in any character sets I use (Arial Unicode MS; CJK Unified Ideographs), or know of. I was delighted to see that most--if not all--of the graphs I need are in Unicode charts. However they all seem to be images, so it seems that I cannot use them.I need to be able to both input and display the rare graphs in my MS system. As far as I cantell, what I may need to do is add characters to my existing character sets, or get new character sets altogether. For example, in a website I commonly use to view reliable, annotatededitions of source texts www.chant.org(Chinese Ancient Texts Center at the Chinese U of Hong Kong), I frequently encounter blank spaces where rare graphs are located. (I have downloaded all their fontpacks.) Third-party layovers with "font-maker" utilities such as Twinbridgeor Chinese Star are too unstable,in my experience causing frequent crashes. Furthermore they are not always convertible to or compatible with unicode. The most stable setup I have found for a PC is a localized version of Win2000, however the MS input methods for Chinese are much too cumbersome and slow,cover only a portion of the language, of course have no font-maker utility, and do not even seem to be able to retrieve all the graphs stored in my existing character sets. Besides Chinese and Japanese graphs, I also need to occasionally enter Latin or other letters with diacritical marks, such as romanized Sanskrit (for Buddhist terms), IPA symbols, and others, but I cannot always find them in the character sets I have. Some of you should get a chuckle (or a groan) to learn that I'm manually searching through character setsin "Insert (drop-down); Symbol..." in MS Word 2000 to find them. Of course their location gets memorized after a while--but this gets more difficult with CJK graphs! (They're not trueideographs, by the way--they are mostly logographic.) I realize there may be no easy solution for my problem, but any advice would be greatly appreciated. From a frustrated sinologist, Allen Haaheim
Re: DBCS and Unicode 3.1
Thanks, all, for your responses. They helped me to better phrase my question: Does anyone know of a way to process GB 18030 data in COBOL on MVS? Thanks, --Erik Ostermueller
DBCS and Unicode 3.1
Hello all, In the past, DBCS could support characters no larger than 2 bytes. Correct? Now that Unicode 3.1 has broken the two-byte barrier, is there a corresponding update for DBCS? I've been getting most of my DBCS info from these url's: http://oss.software.ibm.com/icu/userguide/conversion-data.html http://www-919.ibm.com/developer/dbcs/guide3.html#DBCS Thanks, Erik Ostermueller
RE: {SPAM?} RE: CJK test data
All, For all those interested in following my search for GB 18030 test data. I'm having a another one of those 'senior moments'. I could have sworn that I sent this to the list already, but can't find it anywhere. Forgive me if I've already sent this info. I found some GB 18030 test data on the website of a private consultancy. http://www.chinesization.com/gb18030_standard.htm Download the testdata.zip from this page. My Chinese-speaking colleage, Tianmiao Hu, informs me that this _is_ the official test data. Can anyone confirm or deny this? It would be nice to find this same data from a more official source. Instructions in english would be helpful, also. :-) --Erik -Original Message- From: Anthony Fok [mailto:[EMAIL PROTECTED]] Sent: Saturday, February 08, 2003 3:35 AM To: Ostermueller, Erik Cc: [EMAIL PROTECTED] Subject: Re: {SPAM?} RE: CJK test data On Fri, Feb 07, 2003 at 04:19:04PM -0600, [EMAIL PROTECTED] wrote: Markus wrote: For general test data for determining support of GB 18030 I suggest to contact the Chinese government and its standards agency. They have defined a certification procedure, and I assume that the data and procedure are available. I have no direct contacts for this myself. Here is contact info from an 18030 article by Tom Emerson. http://lisa.org/archive_domain/newsletters/2002/2.3/emerson.html Hmmm. No url. No email address. This will be interesting. Standard Conformity Testing Center for Information Products #1 Andingmen Dong Da Jie Beijing, China Tel: 84029573 or 84029792 Fax: 64007681 Tom Emerson's article is news to me, and I find it very helpful. :-) There _is_ an e-mail address that interesting parties could try, that of CHEN Zhuang Chinese IT Standardization Technical Committee Chinese Electronics Standardization Institute His e-mail address is included in the Application of IANA Charset Registration for GB18030: http://www.iana.org/assignments/charset-reg/GB18030 I suppose Mr. Chen does not in the Testing Center, but he may be able to provide some other pointers. :-) Cheers, Anthony -- Anthony Fok Tung-Ling ThizLinux Laboratory [EMAIL PROTECTED] http://www.thizlinux.com/ Debian Chinese Project [EMAIL PROTECTED] http://www.debian.org/intl/zh/ Come visit Our Lady of Victory Camp! http://www.olvc.ab.ca/
RE: CJK test data
Markus wrote: For general test data for determining support of GB 18030 I suggest to contact the Chinese government and its standards agency. They have defined a certification procedure, and I assume that the data and procedure are available. I have no direct contacts for this myself. Here is contact info from an 18030 article by Tom Emerson. http://lisa.org/archive_domain/newsletters/2002/2.3/emerson.html Hmmm. No url. No email address. This will be interesting. Standard Conformity Testing Center for Information Products #1 Andingmen Dong Da Jie Beijing, China Tel: 84029573 or 84029792 Fax: 64007681
CJK test data
I'm starting to put together some CJK test data as described below. Before I dive in, I was curious if any of this work is already available on the web. If not, would others be interested seeing this, once complete? ### CJK Test data. This is just a start! Need to produce a set of CJK data that is geared towards testing string manipulation support in any software system. The intent of the data would be to test software systems, regardless of platform, software language or even API. All data need english translations and instructions for entering the data using an IME on a QWERTY keyboard. Need tests to prove that a system SUPPORTS GB 18030 Need tests to prove that a system SUPPORTS GB 13000 Need tests to prove that a system DOES NOT support GB 18030 Need tests to prove that a system DOES NOT support GB 13000 Tests: need two sets of data, on for 13000, one for 18030 1) Sorting Test a) include a list of un-ordered strings. b) follow that with the same list, ordered properly. 2) Text searching -Need single character search and multiple character search. Must include the 'key' that we're looking for and strings that do and do not contain that key. 3) Character classification We need data to test some subset of the predicate functions: isSpace(), isAlpha(), is*():
RE: discovering code points with embedded nulls
I didn't get any attachments. Hmmm. --Erik O. -Original Message- From: Stefan Persson [mailto:[EMAIL PROTECTED]] Sent: Thursday, February 06, 2003 11:12 AM To: Kent Karlsson Cc: Ostermueller, Erik; [EMAIL PROTECTED] Subject: Re: discovering code points with embedded nulls What is that strange file (winmail.dat) attached to your mail? I really hope that it isn't a virus. Stefan Kent Karlsson wrote: From what I'm hearing from you all is that a null in UTF-8 is for termination and termination only. Is this correct? No, NULL is a character (actually a control character) among many others. However, many C/C++ APIs (mis)use NULL as a string terminator since NULL isn't very useful for other things. /kent k _ Gratis e-mail resten av livet på www.yahoo.se/mail Busenkelt!
discovering code points with embedded nulls
Hello, all. I'm dealing with an API that claims it doesn't support unicode characters with embedded nulls. I'm trying to figure out how much of a liability this is. What is my best plan of attack for discovering precisely which code points have embedded nulls given a particular encoding? Didn't find it in the maillist archive. I've googled for quite a while with no luck. I'll want to do this for a few different versions of unicode and a few different encodings. What if I write a program using some of the data files available at unicode.org? Am I crazy (I'm new at this stuff) or am I getting warm? Perhaps this data file: http://www.unicode.org/Public/UNIDATA/UnicodeData.txt ? Algorithm: INPUT: Name of unicode code point file INPUT: Name of encoding (perhaps UTF-8) Read code point from file. Expand code point to encoded format for the given encoding. Test all constituent bytes for 0x00. Goto next code point from file. Thanks in advance for any help, --Erik O.
compatibility between unicode 2.0 and 3.0
We have a large amount of C++ that currently has Unicode 2.0 support. Could you all help me figure out what types of operations will fail if we attempt to pass Unicode 3.0 thru this code? I can start the list off with -sorting -searching for text -text comparison -other character classification (isSpace, isDigit, etc...). I'm understand that these operations probably won't work in ALL cases. But how about basic plumbing code -- creating and copying string? As I mentioned in my last post, I've enjoyed listening in on this forum -- I've learned a whole lot. Thanks, --Erik Ostermueller