I'm starting to put together some CJK test data
as described below.

Before I dive in, I was curious if any of this
work is already available on the web.
If not, would others be interested seeing this,
once complete?

###############################################################
CJK Test data.
This is just a start!

    Need to produce a set of CJK data that is geared towards
    testing string manipulation support in any software system.
    The intent of the data would be to test software systems,
    regardless of platform, software language or even API.

    All data need english translations and instructions for
    entering the data using an IME on a QWERTY keyboard.

    Need tests to prove that a system SUPPORTS GB 18030
    Need tests to prove that a system SUPPORTS GB 13000
    Need tests to prove that a system DOES NOT support GB 18030
    Need tests to prove that a system DOES NOT support GB 13000

    Tests: need two sets of data, on for 13000, one for 18030
      1) Sorting Test 
        a) include a list of un-ordered strings.
        b) follow that with the same list, ordered properly.
  
      2) Text searching
        -Need single character search and multiple character search.
         Must include the 'key' that we're looking for and 
          strings that do and do not contain that key.

      3) Character classification
          We need data to test some subset of the predicate functions: isSpace(), 
isAlpha(), is*():

Reply via email to