RE: CJK test data

2003-02-11 Thread Andrew C. West
On Tue, 11 Feb 2003 02:35:30 -0800 (PST), [EMAIL PROTECTED] wrote:

 My Chinese-speaking colleage, Tianmiao Hu, informs me that
 this _is_ the official test data.  Can anyone confirm or deny this?
 
 It would be nice to find this same data from a more official source.
 Instructions in english would be helpful, also. :-)

These seem to be the official test data files. They are also downloadable from
various Chinese websites, including :

http://www.siisa.net.cn/epublish/gb/paper1/20010626/class00014/hwz245.htm

http://www.foundertype.com/english/web/product/test.htm

The former page has full explanations of the test procedure ... basically, to be
able to open each of the test files, and correctly* display and print the entire
contents of the page (comparing the results with the official GB 18030-2000 code
tables); and to be able to open and edit random.txt.

* N.B. correctly means according to the glyphs in the code charts for
Tibetan (Zang.txt) and Mongolian (Meng.txt).

Regards,

Andrew




RE: {SPAM?} RE: CJK test data

2003-02-10 Thread Erik.Ostermueller
All,

For all those interested in following my search for GB 18030 test data.
I'm having a another one of those 'senior moments'.  I could have sworn
that I sent this to the list already, but can't find it anywhere.
Forgive me if I've already sent this info.


I found some GB 18030 test data on the website of a private consultancy.
http://www.chinesization.com/gb18030_standard.htm

Download the testdata.zip from this page.

My Chinese-speaking colleage, Tianmiao Hu, informs me that
this _is_ the official test data.  Can anyone confirm or deny this?

It would be nice to find this same data from a more official source.
Instructions in english would be helpful, also. :-)

--Erik

   -Original Message-
   From: Anthony Fok [mailto:[EMAIL PROTECTED]]
   Sent: Saturday, February 08, 2003 3:35 AM
   To: Ostermueller, Erik
   Cc: [EMAIL PROTECTED]
   Subject: Re: {SPAM?} RE: CJK test data
   
   
   On Fri, Feb 07, 2003 at 04:19:04PM -0600, 
   [EMAIL PROTECTED] wrote:
Markus wrote:
 For general test data for determining support of GB 
   18030 I suggest to
 contact the Chinese government and its standards 
   agency. They have
 defined a certification procedure, and I assume 
   that the data and
 procedure are available. I have no direct contacts 
   for this myself.

Here is contact info from an 18030 article by Tom Emerson.

   http://lisa.org/archive_domain/newsletters/2002/2.3/emerson.html
Hmmm.  No url.  No email address.  This will be interesting.

Standard Conformity Testing Center for Information Products 
#1 Andingmen Dong Da Jie 
Beijing, China 
Tel: 84029573 or 84029792 
Fax: 64007681
   
   Tom Emerson's article is news to me, and I find it very 
   helpful.  :-)
   
   There _is_ an e-mail address that interesting parties 
   could try, that of
   
   CHEN Zhuang 
 Chinese IT Standardization Technical Committee
 Chinese Electronics Standardization Institute
   
   His e-mail address is included in the Application of 
   IANA Charset
   Registration for GB18030:
   
   http://www.iana.org/assignments/charset-reg/GB18030
   
   I suppose Mr. Chen does not in the Testing Center, but 
   he may be able to
   provide some other pointers.  :-)
   
   Cheers,
   
   Anthony
   
   -- 
   Anthony Fok Tung-Ling
   ThizLinux Laboratory   [EMAIL PROTECTED] 
http://www.thizlinux.com/
Debian Chinese Project [EMAIL PROTECTED]   http://www.debian.org/intl/zh/
Come visit Our Lady of Victory Camp!   http://www.olvc.ab.ca/




Re: CJK test data

2003-02-07 Thread Markus Scherer
Michael (michka) Kaplan wrote:

GB18030 does not define a specific standard for sorting (as far as I know, neither does GB13000). It
is an encoding standard.


GB 18030 certainly does not define sorting. It defines a CCS/CES based on a mapping table to/from 
Unicode/ISO 10646.

GB 13000 is, as far as I know, just the Chinese adoption of ISO 10646. As such, it is likely to also 
not define sorting because the relevant ISO standard is 14651 (=UCA).

For general test data for determining support of GB 18030 I suggest to contact the Chinese 
government and its standards agency. They have defined a certification procedure, and I assume that 
the data and procedure are available. I have no direct contacts for this myself.

markus

--
Opinions expressed here may not reflect my company's positions unless otherwise noted.




RE: CJK test data

2003-02-07 Thread Erik.Ostermueller
Markus wrote:
   For general test data for determining support of GB 
   18030 I suggest to contact the Chinese 
   government and its standards agency. They have defined 
   a certification procedure, and I assume that 
   the data and procedure are available. I have no direct 
   contacts for this myself.

Here is contact info from an 18030 article by Tom Emerson.
http://lisa.org/archive_domain/newsletters/2002/2.3/emerson.html
Hmmm.  No url.  No email address.  This will be interesting.

Standard Conformity Testing Center for Information Products 
#1 Andingmen Dong Da Jie 
Beijing, China 
Tel: 84029573 or 84029792 
Fax: 64007681




CJK test data

2003-02-06 Thread Erik.Ostermueller
I'm starting to put together some CJK test data
as described below.

Before I dive in, I was curious if any of this
work is already available on the web.
If not, would others be interested seeing this,
once complete?

###
CJK Test data.
This is just a start!

Need to produce a set of CJK data that is geared towards
testing string manipulation support in any software system.
The intent of the data would be to test software systems,
regardless of platform, software language or even API.

All data need english translations and instructions for
entering the data using an IME on a QWERTY keyboard.

Need tests to prove that a system SUPPORTS GB 18030
Need tests to prove that a system SUPPORTS GB 13000
Need tests to prove that a system DOES NOT support GB 18030
Need tests to prove that a system DOES NOT support GB 13000

Tests: need two sets of data, on for 13000, one for 18030
  1) Sorting Test 
a) include a list of un-ordered strings.
b) follow that with the same list, ordered properly.
  
  2) Text searching
-Need single character search and multiple character search.
 Must include the 'key' that we're looking for and 
  strings that do and do not contain that key.

  3) Character classification
  We need data to test some subset of the predicate functions: isSpace(), 
isAlpha(), is*():




Re: CJK test data

2003-02-06 Thread Michael \(michka\) Kaplan
From: [EMAIL PROTECTED]

   1) Sorting Test
 a) include a list of un-ordered strings.
 b) follow that with the same list, ordered properly.

GB18030 does not define a specific standard for sorting (as far as I know, neither 
does GB13000). It
is an encoding standard.

Since GB18030 covers all of Unicode, this is a good thing.

MichKa