GB18030
In what ways will this effect Unicode?
Does it contain anything that Unicode doesn't?
1. Is there another document/algorithm/table that does provide
guidelines for sorting numbers within strings? Something
that deals with different scripts?
ISO/IEC 14651 International String Ordering includes
an informative annex on this topic. In particular, see
Charlie,
In what ways will this effect Unicode?
Does it contain anything that Unicode doesn't?
I suggest that you take a look at Markus Scherer paper "GB 18030: A
mega-codepage"
http://www-106.ibm.com/developerworks/library/u-china.html
It will probably answer your question on the
The small letters are for making like in my fake name. The regular Ri and the small Yu
make Ryu.
Some syllables require 2 katakana (or hiragana) symbols.
But the thing is, are "ra gyou" kana to be regarded as having R or L for their
consonant?
You can get lots of 2-kana syllables. Like in the
Tree said:
While the conversion between UTF-8 and UTF-16/UCS-2 is algorithmic and
very fast, we need to remember that a buffer needs to be allocated to
hold the converted result, and the data needs to be copied as things
go in and out of the library.
Well, of course. But then I am mostly a
On Fri, 21 Sep 2001, Carl W. Brown wrote:
Most systems that handle GB18030 will want to convert it to Unicode first
to reduce processing overhead.
Unless we start seeing Chinese software which is designed to utilize the
compatibility between 18030 and GBK -- font rendering apps and the
I would like to add that ICU 2.0 (in a few weeks) will have convenience functions for
in-process string transformations:
UTF-16 - UTF-8
UTF-16 - UTF-32
UTF-16 - wchar_t*
markus
I think I've figured out a way to find the beginning of a GB18030 character starting
anywhere in a document. The algorithm is similar to finding the beginning of a DBCS
character in that you scan backward until you find a byte that can only come at the
start of a character. The main difference
Mozilla also use Unicode internally and are cross platform.
[EMAIL PROTECTED] wrote:
For cross-platform software (NT,Solaris,HP,AIX),
the only 3rd-party unicode support
I found so far is IBM ICU.
It's a very good support for
cross-platform software internationalization. However,
ICU internally
bascillay GB18030 is design to encode All Unicode BMP in a encoding which is
backward compatable with GB2312 and GBK.
The birth of GB18030 is because those characters which are encoded unicode
but not encoded in GB2312 neither GBK.
Thierry Sourbier wrote:
Charlie,
In what ways will this
Markus Scherer wrote:
I would like to add that ICU 2.0 (in a few weeks) will have convenience functions
for in-process string transformations:
UTF-16 - UTF-8
UTF-16 - UTF-32
UTF-16 - wchar_t*
Wait be careful here. wchar_t is not an encoding. So.. in theory, you cannot
Yung-Fong Tang wrote:
UTF-16 - wchar_t*
Wait be careful here. wchar_t is not an encoding. So.. in theory, you cannot
convert between UTF-16 and wchar_t. You,
however, can convert between UTF-16 and wchar_t* ON win32 since microsoft declare
UTF-16 as the encoding for wchar_t.
On Fri, Sep 21, 2001 at 04:16:50PM -0700, Yung-Fong Tang wrote:
Then... use Unicode internally in your software regardless you use
UTF-8 or UCS2 as the data type in the interface, eventually some code
need to convert it to UCS2 for most of the processing.
Why? UCS2 shouldn't be used at
UTF-16 - wchar_t*
Wait be careful here. wchar_t is not an encoding. So.. in
theory, you cannot convert between UTF-16 and wchar_t. You,
however, can convert between UTF-16 and wchar_t* ON win32
since microsoft declare UTF-16 as the encoding for wchar_t.
And he can also do some
14 matches
Mail list logo