RE: CJK test data
On Tue, 11 Feb 2003 02:35:30 -0800 (PST), [EMAIL PROTECTED] wrote: My Chinese-speaking colleage, Tianmiao Hu, informs me that this _is_ the official test data. Can anyone confirm or deny this? It would be nice to find this same data from a more official source. Instructions in english would be helpful, also. :-) These seem to be the official test data files. They are also downloadable from various Chinese websites, including : http://www.siisa.net.cn/epublish/gb/paper1/20010626/class00014/hwz245.htm http://www.foundertype.com/english/web/product/test.htm The former page has full explanations of the test procedure ... basically, to be able to open each of the test files, and correctly* display and print the entire contents of the page (comparing the results with the official GB 18030-2000 code tables); and to be able to open and edit random.txt. * N.B. correctly means according to the glyphs in the code charts for Tibetan (Zang.txt) and Mongolian (Meng.txt). Regards, Andrew
RE: {SPAM?} RE: CJK test data
All, For all those interested in following my search for GB 18030 test data. I'm having a another one of those 'senior moments'. I could have sworn that I sent this to the list already, but can't find it anywhere. Forgive me if I've already sent this info. I found some GB 18030 test data on the website of a private consultancy. http://www.chinesization.com/gb18030_standard.htm Download the testdata.zip from this page. My Chinese-speaking colleage, Tianmiao Hu, informs me that this _is_ the official test data. Can anyone confirm or deny this? It would be nice to find this same data from a more official source. Instructions in english would be helpful, also. :-) --Erik -Original Message- From: Anthony Fok [mailto:[EMAIL PROTECTED]] Sent: Saturday, February 08, 2003 3:35 AM To: Ostermueller, Erik Cc: [EMAIL PROTECTED] Subject: Re: {SPAM?} RE: CJK test data On Fri, Feb 07, 2003 at 04:19:04PM -0600, [EMAIL PROTECTED] wrote: Markus wrote: For general test data for determining support of GB 18030 I suggest to contact the Chinese government and its standards agency. They have defined a certification procedure, and I assume that the data and procedure are available. I have no direct contacts for this myself. Here is contact info from an 18030 article by Tom Emerson. http://lisa.org/archive_domain/newsletters/2002/2.3/emerson.html Hmmm. No url. No email address. This will be interesting. Standard Conformity Testing Center for Information Products #1 Andingmen Dong Da Jie Beijing, China Tel: 84029573 or 84029792 Fax: 64007681 Tom Emerson's article is news to me, and I find it very helpful. :-) There _is_ an e-mail address that interesting parties could try, that of CHEN Zhuang Chinese IT Standardization Technical Committee Chinese Electronics Standardization Institute His e-mail address is included in the Application of IANA Charset Registration for GB18030: http://www.iana.org/assignments/charset-reg/GB18030 I suppose Mr. Chen does not in the Testing Center, but he may be able to provide some other pointers. :-) Cheers, Anthony -- Anthony Fok Tung-Ling ThizLinux Laboratory [EMAIL PROTECTED] http://www.thizlinux.com/ Debian Chinese Project [EMAIL PROTECTED] http://www.debian.org/intl/zh/ Come visit Our Lady of Victory Camp! http://www.olvc.ab.ca/
Re: CJK test data
Michael (michka) Kaplan wrote: GB18030 does not define a specific standard for sorting (as far as I know, neither does GB13000). It is an encoding standard. GB 18030 certainly does not define sorting. It defines a CCS/CES based on a mapping table to/from Unicode/ISO 10646. GB 13000 is, as far as I know, just the Chinese adoption of ISO 10646. As such, it is likely to also not define sorting because the relevant ISO standard is 14651 (=UCA). For general test data for determining support of GB 18030 I suggest to contact the Chinese government and its standards agency. They have defined a certification procedure, and I assume that the data and procedure are available. I have no direct contacts for this myself. markus -- Opinions expressed here may not reflect my company's positions unless otherwise noted.
RE: CJK test data
Markus wrote: For general test data for determining support of GB 18030 I suggest to contact the Chinese government and its standards agency. They have defined a certification procedure, and I assume that the data and procedure are available. I have no direct contacts for this myself. Here is contact info from an 18030 article by Tom Emerson. http://lisa.org/archive_domain/newsletters/2002/2.3/emerson.html Hmmm. No url. No email address. This will be interesting. Standard Conformity Testing Center for Information Products #1 Andingmen Dong Da Jie Beijing, China Tel: 84029573 or 84029792 Fax: 64007681
CJK test data
I'm starting to put together some CJK test data as described below. Before I dive in, I was curious if any of this work is already available on the web. If not, would others be interested seeing this, once complete? ### CJK Test data. This is just a start! Need to produce a set of CJK data that is geared towards testing string manipulation support in any software system. The intent of the data would be to test software systems, regardless of platform, software language or even API. All data need english translations and instructions for entering the data using an IME on a QWERTY keyboard. Need tests to prove that a system SUPPORTS GB 18030 Need tests to prove that a system SUPPORTS GB 13000 Need tests to prove that a system DOES NOT support GB 18030 Need tests to prove that a system DOES NOT support GB 13000 Tests: need two sets of data, on for 13000, one for 18030 1) Sorting Test a) include a list of un-ordered strings. b) follow that with the same list, ordered properly. 2) Text searching -Need single character search and multiple character search. Must include the 'key' that we're looking for and strings that do and do not contain that key. 3) Character classification We need data to test some subset of the predicate functions: isSpace(), isAlpha(), is*():
Re: CJK test data
From: [EMAIL PROTECTED] 1) Sorting Test a) include a list of un-ordered strings. b) follow that with the same list, ordered properly. GB18030 does not define a specific standard for sorting (as far as I know, neither does GB13000). It is an encoding standard. Since GB18030 covers all of Unicode, this is a good thing. MichKa