I am intrigued by GB18030 encoding. There is a table of equivalences in
http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/xml/gb-18030-200
0.xml
No doubt Unihan will at some stage include these 2 4 byte values.
I enquired about the 'super font' created by a Beijing foundry,
http
Raymond Mercier wrote:
I am intrigued by GB18030 encoding. There is a table of equivalences in
http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/xml/gb-18030-200
0.xml
No doubt Unihan will at some stage include these 2 4 byte values.
I enquired about the 'super font' created
Mark Shoulson writestheir Super Font is
bundled with Microsoft Office XP, and even Microsoft's prices haven't
gotten that high!From Microsoft,http://www.microsoft.com/globaldev/DrIntl/columns/015/default.mspx :"A font that contains Simplified Chinese glyphs from
both CJK Extension Aand B
Possibly they were quoting the price for one to be able
to bundle their font with software that you would sell.
Judging by the website, I don't think that their intent is
to sell directly to individual users. In that context, the
price doesn't seem unreasonable at all. When you
consider that
From: Mark E. Shoulson [EMAIL PROTECTED]
Raymond Mercier wrote:
I am intrigued by GB18030 encoding. There is a table of equivalences in
http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/xml/gb-18030-200
0.xml
No doubt Unihan will at some stage include these 2 4 byte values.
I
Raymond Mercier wrote:
But that link to proofing tools leads nowhere. Maybe it's not be so
easy to
get the CHS version.
http://www.amazon.com/exec/obidos/tg/detail/-/BBZ54P/qid=1082651762/sr=8-1/ref=pd_ka_1/103-8333725-5907026?v=glances=softwaren=507846
Includes ~140
-
From:
Eric Muller
To: [EMAIL PROTECTED]
Sent: Thursday, April 22, 2004 5:40
PM
Subject: Re: GB18030 and super font
Raymond Mercier wrote:
But that
link to proofing tools leads nowhere. Maybe it's not be so easy toget
the CHS version.http://www.amazon.com/exec
Raymond Mercier wrote on 4/22/2004, 7:35 AM:
I enquired about the 'super font' created by a Beijing foundry,
http://font.founder.com.cn/english/web/index.htm, and am fairly
astonished
at the prices, as you see from the attached.
The cost of produce these fonts are much higher than
On 22/04/2004 10:04, Raymond Mercier wrote:
Eric,
Amazin' Amazon!! Now why didn't I think of that ?
In fact the uk Amazon.co.uk say it is discontinued, so I would have to
get it from Amazon in the US. It is not the first time that the two
Amazon's fail to connect.
Many thanks for the tip,
In case you want to test
your GB18030 font, you can use Netscape 7 (or lateset Mozilla) and then
visit my GB18030 test pages at
http://people.netscape.com/ftang/testscript/gb18030/gb18030.cgi?page=10
It should be page to page compatable to the paper copy of GB18030-2000
standard. I also
Raymond Mercier wrote:
Mark Shoulson writes
their Super Font is bundled with Microsoft Office XP, and
even Microsoft's prices haven't gotten that high!
>From Microsoft,
http://www.microsoft.com/globaldev/DrIntl/columns/015/default.mspx :
"A font that contains
Hello. I believe this must be a frequent question, but I googled around
and I didn't find a satisfying tool. It seems most converters do GB2312
but not GB18030.
I have 100+ files to convert, normal graphical /web based converters
won't do the work well.
On my FreeBSD there is a ported tool
Hello. I believe this must be a frequent question, but I googled around
and I didn't find a satisfying tool. It seems most converters do GB2312
but not GB18030.
Both GNU libc iconv and GNU libiconv support GB18030. I assume the libiconv
distribution includes the command line utility.
Regards
Peter Jacobi wrote:
Hello. I believe this must be a frequent question, but I googled around
and I didn't find a satisfying tool. It seems most converters do GB2312
but not GB18030.
Both GNU libc iconv and GNU libiconv support GB18030. I assume the libiconv
distribution includes the command
you can also use 'nsconv' which come with mozilla source code with GB18030.
see http://www.mozilla.org/projects/l10n/mlp_tools.html for details
Zhang Weiwu wrote on 3/5/2004, 6:43 AM:
Hello. I believe this must be a frequent question, but I googled around
and I didn't find a satisfying tool
Hi Will,
The ICU library is a good source for information like this. See:
http://oss.software.ibm.com/icu/charset/
The data table is located here:
http://oss.software.ibm.com/cvs/icu/charset/data/xml/gb-18030-2000.xml
Read the note on the first page.
There are official sources as well, but I
Doug,
However, 16 bit characters were a hard enough sell in the good old
days. If we had started out withug 2bit characters we would still be
dreaming about Unicode.
I think Carl meant with 32-bit characters. I don't know what kind of
word withug is (Old English?), but I like it.
It
Michael Yau wrote:
Markus,
The standard does _not_ require to _process_ internally in GB18030. It
is sufficient to have a converter and to process in Unicode, which does
contain all of the characters.
Just curious, do you have this in writing from the China standards body?
I don't
Jane, you are right, I over-simplified. I tried to make the point that you need not _process_ text
in GB18030 but that Unicode processing and conversion to/from GB18030 fulfills the requirement to be
able to read and write GB18030 text.
Yes, you need to have font support for all the characters
that I shouldn't
care.
John
Microsoft
-Original Message-
From: Doug Ewell [mailto:dewell;adelphia.net]
Sent: Thursday, November 14, 2002 8:26 PM
To: Unicode Mailing List
Cc: Carl W. Brown
Subject: Re: UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030
Carl W. Brown cbrown at xnetinc dot com
Thanks Mark !
That may mean IBM AIX 5 support converison between GB18030 and
Unicode, but I don't see this is a system level of support because
there is no locale names for GB18030 in the doc of AIX 5 :
http://publibn.boulder.ibm.com/doc_link/en_US/a_doc_lib/aixbman/admnconc/locale.htm
Zh_CN
Jane,
One of the problems is that early Unicode adopters used the 16 bit UCS-2
encoding for of Unicode. Converting to UTF-16 requires surrogate support.
Some of the GB18030 characters require this support. ICU is dedicated to
Unicode support so a lot of effort is put into ICU to keep it up
string handling assume that the single-code-point type is the same as the string base unit.
This one design point requires 32-bit wchar_t not just for Unicode but also for the character sets
of EUC-TW and GB18030.
You seem to suggest that there is a problem with 16-bit Unicode. It does take some
Jane Liu wrote:
That may mean IBM AIX 5 support converison between GB18030 and
Unicode, but I don't see this is a system level of support because
there is no locale names for GB18030 in the doc of AIX 5 :
The GB 18030 standard requires software to be able to _read and write_ text in the GB18030
From: Carl W. Brown [EMAIL PROTECTED]
Other companies
like Microsoft took a very big gamble and implemented the code for
surrogate
support into Windows 2000 based on early drafts of the Unicode standard.
If
they had not done it this way or had guessed wrong they might not even
have
support
Markus,
The standard does _not_ require to _process_ internally in GB18030. It
is sufficient to have a converter and to process in Unicode, which does
contain all of the characters.
Just curious, do you have this in writing from the China standards body?
- Michael
Markus Scherer wrote
Mark,
I think only converter is not sufficient. How about the following
support :
- IME (to input CJK Ext.A characters through GB18030/Unicode code)
- X-Windows fonts support.
- iconv support
- mbtowc(), mbstowcs(), mblen()...
- and so on...
You need be able to do like what you can do on Solaris
To:Markus Scherer [EMAIL PROTECTED], unicode [EMAIL PROTECTED]
cc:
Subject:Re: IBM AIX 5 and GB18030
Thanks Mark !
That may mean IBM AIX 5 support converison between GB18030 and
Unicode, but I don't see this is a system level of support because
there is no locale names
] [mailto:unicode-bounce;unicode.org]On
Behalf Of Markus Scherer
Sent: Thursday, November 14, 2002 9:18 AM
To: unicode
Subject: Re: IBM AIX 5 and GB18030
Carl W. Brown wrote:
Some Unix systems adapted faster because the later Unicode
adopters used 32
bit Unicode characters making the job
Markus,
You seem to suggest that there is a problem with 16-bit Unicode.
It does take some effort to adapt
UCS-2-designed functions for UTF-16, but it's not rocket
science and works very well thanks to the
Unicode allocation practice (common characters in the BMP).
Making UTF-8/32 functions
Carl W. Brown cbrown at xnetinc dot com wrote:
Converting from UCS-2 to UTF-16 is just like converting from SBCS to
DBCS. For folks who think DBCS it is no problem. Those who went from
DBCS to Unicode to simplify their lives I am sure are not happy.
Ken made me laugh last March by referring
Dear I18N experts,
I have searched all the web on IBM about the support of GB18030 in OS
AIX 4.3 and 5, but didn't find anything. I only can see they support
GB2312 and GBK.
I know IBM was one of the pioneer to support GB18030, i.e. their ICU.
But it doesn't make sense their AIX doesn't
xjliu_ca wrote:
I have searched all the web on IBM about the support of GB18030 in OS
AIX 4.3 and 5, but didn't find anything. I only can see they support
GB2312 and GBK.
Google found something for me:
http://www-3.ibm.com/software/ts/mqseries/support/readme/aix530_read.html
Search for 18030
I cannot find GB18030 stardard in local library, neither can I find it
anywhere on the Internet. I wish to know the stardard itself.
GB18030 contains about 27000 characters. CJK contains about 21000 characters
and CJK Extension A 6000 characters. (i don't remeber the actual number.) It
seems
Sorry, second post, this looks like the standard can
be downloaded now from on-line once you are a
registered member of this site:
(all-on-one-line:)
http://www.sun.com/developers/gadc/technicalpublications/articles/gb18030.html
Best regards,
James Kass.
- Original Message -
From
GB 18030 is aligned to ISO 10646, which does not define the semantic
properties that Unicode does.
--
Tom Emerson Basis Technology Corp.
Sr. Sinostringologist http://www.basistech.com
Beware the lollipop of mediocrity: lick
Sure I know it could (and will ) be implement by a mapping table. But you
still need to know what U+4ff3a to define such mapping table, right ? and the
mapping table will still be part of the software package, right ? And the user
still won't get your new version of mapping table untill they
Frank,
You don't need to explain to me
the concept of GB18030. The question I have is about details mapping
information.
Now, now, there's no need to get snippy with me. It sounded
like you were unclear from the kinds of questions you were
asking.
I look at
http://oss.software.ibm.com/cvs
From: Yung-Fong Tang [EMAIL PROTECTED]
Can anyone tell me where can I find a online version of the GB18030
standard (yes, I want the STANDARD itself. Not someone's paper talk
about the standard) . Or anyone could tell me where to get a copy of the
standard.
You mean the original Chinese
you do that.
> In particular, DOES GB18030 define code point to
> code point mapping (beyond BMP) between Unicode? Unless you can said
that is YES and show me the specification how to map between
> them, there are no way people can implement code set conversion between
GB18030 and Unico
Kenneth Whistler wrote:
Frank,
> You don't need to explain to me
> the concept of GB18030. The question I have is about details mapping
> information.
Now, now, there's no need to get snippy with me. It sounded
like you were unclear from the kinds of questions you were
asking.
Sorry fo
itself and asking help to get one. Do you have the
access to the specification and DOES it specify so?
Do you not have access to the web? It took me 4 minutes to find the
information on the web. Start with www.google.com and type in GB18030,
and you'll find most of the information right
On Thu, Sep 27, 2001 at 12:27:11PM -0700, Yung-Fong Tang wrote:
looks like I beat ICU by checkin my mapping table at April 9 (to
mozilla) , 10 days before they check in their first version of GB18030
xml mapping table :) I probably can still claim the first open source
project which support
http://bugzilla.mozilla.org/show_bug.cgi?id=101998 I also submit a patch there
(see the bug report). Unfortunately , I don't have time to test it yet.
It will be nice if someone can code review that change for me.
Sun folks, do you care about GB18030 to surrogate conversion in mozilla ?
Please help
Yung-Fong Tang wrote:
... But you
still need to know what U+4ff3a to define such mapping table, right?
Wrong. You just need to know the mapping between code points, whether assigned, used,
or whatever.
... So, whatever the software the user currently have today, without an
upgrade (either
? It took me 4 minutes to find the
information on the web. Start with www.google.com and type in GB18030,
and you'll find most of the information right there. Others have
pointed out more specific links.
No, I am NOT asking the information about ths GB18030 standard. I am asking the
GB18030 standard
ok... you beat me :)
David Starner wrote:
On Thu, Sep 27, 2001 at 12:27:11PM -0700, Yung-Fong Tang wrote:
looks like I beat ICU by checkin my mapping table at April 9 (to
mozilla) , 10 days before they check in their first version of GB18030
xml mapping table :) I probably can still
From: Yung-Fong Tang
Case mapping ? You have no way to generate mapping table for
case mapping with knowing the character unless you already
define those character have no case or only one case.
Um, Unicode defines a behavior and even properties for unassigned code
points. If you choose not
On Thu, Sep 27, 2001 at 03:03:22PM -0700, Yung-Fong Tang wrote:
David Starner wrote:
If you can't recognize the
character, then just don't convert it.
It could be the quality of other's software, we have higher standard however.
Higher standard? If I'm working on Old High German on a
how can you implement tolower(U+4ff3a) without knowing what U+4ff3a is ?
[EMAIL PROTECTED] wrote:
In a message dated 2001-09-24 20:50:25 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
Does GB18030 DEFINED the mapping between GB18030 and the rest of 11 planes?
I don't think so, since
Do you know where I can get the mapping table between GB18030 and Planes 1 to
16? I can only get the mapping between Plane 0 and GB18030.
Tom Emerson wrote:
Yung-Fong Tang writes:
Does GB18030 DEFINED the mapping between GB18030 and the rest of 11
planes? I don't think so, since Unicode
David Starner wrote:
On Mon, Sep 24, 2001 at 06:18:19PM -0700, Yung-Fong Tang wrote:
Markus Scherer wrote:
Correction: to encode _all_ of Unicode, not just all Unicode BMP - GB 18030
covers all 17 planes, not just the BMP.
Does GB18030 DEFINED the mapping between GB18030
On Wed, 26 Sep 2001, Yung-Fong Tang wrote:
how can you implement tolower(U+4ff3a) without knowing what U+4ff3a is ?
With a data table. One set of debugged code that handles surrogates,
composing characters, bidirectionality etc. coupled with a datafile that
gets upgraded with each release of
with the characters, and not the encoded characters
per se. (And this is a disease that was inflicted on the world
23 years ago when Kernighan and Ritchie published a certain
language that unfortunately chose to call its 8-bit numeric
data type a char.)
In particular, DOES GB18030 define code point to
code
From: Geoffrey Waigh [EMAIL PROTECTED]
It shouldn't require honest-to-goodness we-were't-kidding
see-here's-one-defined-now characters
In many cases, it did.
for developers to slap themselves on the head
They did -- and they are slapping others around them, too.
and start developing
if you don't have to (C10). GB18030, if it
claims to support Unicode, needs to round-trip both characters.
--
David Starner - [EMAIL PROTECTED]
Pointless website: http://dvdeug.dhis.org
When the aliens come, when the deathrays hum, when the bombers bomb,
we'll still be freakin' friends. - Freakin
that for BMP characters? There's a whole lot you can do
without knowing the identity of a character. You can draw the glyph from
a font, which will suffice for a lot of purposes.
In particular, DOES GB18030 define code point to
code point mapping (beyond BMP) between Unicode? Unless you can said
Yung-Fong Tang wrote:
bascillay GB18030 is design to encode All Unicode BMP in a encoding which is
backward compatable with GB2312 and GBK.
Correction: to encode _all_ of Unicode, not just all Unicode BMP - GB 18030 covers
all 17 planes, not just the BMP.
markus
Markus Scherer wrote:
Yung-Fong Tang wrote:
bascillay GB18030 is design to encode All Unicode BMP in a encoding which is
backward compatable with GB2312 and GBK.
Correction: to encode _all_ of Unicode, not just all Unicode BMP - GB 18030
covers all 17 planes, not just the BMP.
Does
On Mon, Sep 24, 2001 at 06:18:19PM -0700, Yung-Fong Tang wrote:
Markus Scherer wrote:
Correction: to encode _all_ of Unicode, not just all Unicode BMP - GB 18030
covers all 17 planes, not just the BMP.
Does GB18030 DEFINED the mapping between GB18030 and the rest of 11 planes? I don't
Yung-Fong Tang writes:
Does GB18030 DEFINED the mapping between GB18030 and the rest of 11
planes? I don't think so, since Unicode have not define them yet,
right ?
Sure it does. We know what the code points are, even if they don't
have characters assigned to them yet. This allows GB18030
In a message dated 2001-09-24 20:50:25 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
Does GB18030 DEFINED the mapping between GB18030 and the rest of 11 planes?
I don't think so, since Unicode have not define them yet, right ?
Unicode defined all the planes, a long long time ago. It's
GB18030
In what ways will this effect Unicode?
Does it contain anything that Unicode doesn't?
r question on the relationship between GB18030
and Unicode.
Cheers,
Thierry.
www.i18ngurus.com - Open Internationalization Resources Directory
On Fri, 21 Sep 2001, Carl W. Brown wrote:
Most systems that handle GB18030 will want to convert it to Unicode first
to reduce processing overhead.
Unless we start seeing Chinese software which is designed to utilize the
compatibility between 18030 and GBK -- font rendering apps
I think I've figured out a way to find the beginning of a GB18030 character starting
anywhere in a document. The algorithm is similar to finding the beginning of a DBCS
character in that you scan backward until you find a byte that can only come at the
start of a character. The main difference
bascillay GB18030 is design to encode All Unicode BMP in a encoding which is
backward compatable with GB2312 and GBK.
The birth of GB18030 is because those characters which are encoded unicode
but not encoded in GB2312 neither GBK.
Thierry Sourbier wrote:
Charlie,
In what ways
67 matches
Mail list logo