GB18030

2001-09-21 Thread Charlie Jolly
GB18030 In what ways will this effect Unicode? Does it contain anything that Unicode doesn't?

RE: numeric ordering

2001-09-21 Thread Karlsson Kent - keka
1. Is there another document/algorithm/table that does provide guidelines for sorting numbers within strings? Something that deals with different scripts? ISO/IEC 14651 International String Ordering includes an informative annex on this topic. In particular, see

Re: GB18030

2001-09-21 Thread Thierry Sourbier
Charlie, In what ways will this effect Unicode? Does it contain anything that Unicode doesn't? I suggest that you take a look at Markus Scherer paper "GB 18030: A mega-codepage" http://www-106.ibm.com/developerworks/library/u-china.html It will probably answer your question on the

Kana syllables

2001-09-21 Thread てんどう瘢雹りゅう瘢雹じ
The small letters are for making like in my fake name. The regular Ri and the small Yu make Ryu. Some syllables require 2 katakana (or hiragana) symbols. But the thing is, are "ra gyou" kana to be regarded as having R or L for their consonant? You can get lots of 2-kana syllables. Like in the

Re: UTF-8 UCS-2/UTF-16 conversion for library use

2001-09-21 Thread Kenneth Whistler
Tree said: While the conversion between UTF-8 and UTF-16/UCS-2 is algorithmic and very fast, we need to remember that a buffer needs to be allocated to hold the converted result, and the data needs to be copied as things go in and out of the library. Well, of course. But then I am mostly a

RE: GB18030

2001-09-21 Thread Sampo Syreeni
On Fri, 21 Sep 2001, Carl W. Brown wrote: Most systems that handle GB18030 will want to convert it to Unicode first to reduce processing overhead. Unless we start seeing Chinese software which is designed to utilize the compatibility between 18030 and GBK -- font rendering apps and the

Re: 3rd-party cross-platform UTF-8 support

2001-09-21 Thread Markus Scherer
I would like to add that ICU 2.0 (in a few weeks) will have convenience functions for in-process string transformations: UTF-16 - UTF-8 UTF-16 - UTF-32 UTF-16 - wchar_t* markus

RE: GB18030

2001-09-21 Thread Murray Sargent
I think I've figured out a way to find the beginning of a GB18030 character starting anywhere in a document. The algorithm is similar to finding the beginning of a DBCS character in that you scan backward until you find a byte that can only come at the start of a character. The main difference

Re: 3rd-party cross-platform UTF-8 support

2001-09-21 Thread Yung-Fong Tang
Mozilla also use Unicode internally and are cross platform. [EMAIL PROTECTED] wrote: For cross-platform software (NT,Solaris,HP,AIX), the only 3rd-party unicode support I found so far is IBM ICU. It's a very good support for cross-platform software internationalization. However, ICU internally

Re: GB18030

2001-09-21 Thread Yung-Fong Tang
bascillay GB18030 is design to encode All Unicode BMP in a encoding which is backward compatable with GB2312 and GBK. The birth of GB18030 is because those characters which are encoded unicode but not encoded in GB2312 neither GBK. Thierry Sourbier wrote: Charlie, In what ways will this

Re: 3rd-party cross-platform UTF-8 support

2001-09-21 Thread Yung-Fong Tang
Markus Scherer wrote: I would like to add that ICU 2.0 (in a few weeks) will have convenience functions for in-process string transformations: UTF-16 - UTF-8 UTF-16 - UTF-32 UTF-16 - wchar_t* Wait be careful here. wchar_t is not an encoding. So.. in theory, you cannot

Re: 3rd-party cross-platform UTF-8 support

2001-09-21 Thread Markus Scherer
Yung-Fong Tang wrote: UTF-16 - wchar_t* Wait be careful here. wchar_t is not an encoding. So.. in theory, you cannot convert between UTF-16 and wchar_t. You, however, can convert between UTF-16 and wchar_t* ON win32 since microsoft declare UTF-16 as the encoding for wchar_t.

Re: 3rd-party cross-platform UTF-8 support

2001-09-21 Thread David Starner
On Fri, Sep 21, 2001 at 04:16:50PM -0700, Yung-Fong Tang wrote: Then... use Unicode internally in your software regardless you use UTF-8 or UCS2 as the data type in the interface, eventually some code need to convert it to UCS2 for most of the processing. Why? UCS2 shouldn't be used at

RE: 3rd-party cross-platform UTF-8 support

2001-09-21 Thread Yves Arrouye
UTF-16 - wchar_t* Wait be careful here. wchar_t is not an encoding. So.. in theory, you cannot convert between UTF-16 and wchar_t. You, however, can convert between UTF-16 and wchar_t* ON win32 since microsoft declare UTF-16 as the encoding for wchar_t. And he can also do some