GB18030 and super font

2004-04-22 Thread Raymond Mercier
I am intrigued by GB18030 encoding. There is a table of equivalences in http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/xml/gb-18030-200 0.xml No doubt Unihan will at some stage include these 2 4 byte values. I enquired about the 'super font' created by a Beijing foundry, http

Re: GB18030 and super font

2004-04-22 Thread Mark E. Shoulson
Raymond Mercier wrote: I am intrigued by GB18030 encoding. There is a table of equivalences in http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/xml/gb-18030-200 0.xml No doubt Unihan will at some stage include these 2 4 byte values. I enquired about the 'super font' created

GB18030 and super font

2004-04-22 Thread Raymond Mercier
Mark Shoulson writestheir Super Font is bundled with Microsoft Office XP, and even Microsoft's prices haven't gotten that high!From Microsoft,http://www.microsoft.com/globaldev/DrIntl/columns/015/default.mspx :"A font that contains Simplified Chinese glyphs from both CJK Extension Aand B

Re: GB18030 and super font

2004-04-22 Thread Ernest Cline
Possibly they were quoting the price for one to be able to bundle their font with software that you would sell. Judging by the website, I don't think that their intent is to sell directly to individual users. In that context, the price doesn't seem unreasonable at all. When you consider that

Re: GB18030 and super font

2004-04-22 Thread Philippe Verdy
From: Mark E. Shoulson [EMAIL PROTECTED] Raymond Mercier wrote: I am intrigued by GB18030 encoding. There is a table of equivalences in http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/xml/gb-18030-200 0.xml No doubt Unihan will at some stage include these 2 4 byte values. I

Re: GB18030 and super font

2004-04-22 Thread Eric Muller
Raymond Mercier wrote: But that link to proofing tools leads nowhere. Maybe it's not be so easy to get the CHS version. http://www.amazon.com/exec/obidos/tg/detail/-/BBZ54P/qid=1082651762/sr=8-1/ref=pd_ka_1/103-8333725-5907026?v=glances=softwaren=507846 Includes ~140

Re: GB18030 and super font

2004-04-22 Thread Raymond Mercier
- From: Eric Muller To: [EMAIL PROTECTED] Sent: Thursday, April 22, 2004 5:40 PM Subject: Re: GB18030 and super font Raymond Mercier wrote: But that link to proofing tools leads nowhere. Maybe it's not be so easy toget the CHS version.http://www.amazon.com/exec

Re: GB18030 and super font

2004-04-22 Thread Frank Yung-Fong Tang
Raymond Mercier wrote on 4/22/2004, 7:35 AM: I enquired about the 'super font' created by a Beijing foundry, http://font.founder.com.cn/english/web/index.htm, and am fairly astonished at the prices, as you see from the attached. The cost of produce these fonts are much higher than

Re: GB18030 and super font

2004-04-22 Thread Peter Kirk
On 22/04/2004 10:04, Raymond Mercier wrote: Eric, Amazin' Amazon!! Now why didn't I think of that ? In fact the uk Amazon.co.uk say it is discontinued, so I would have to get it from Amazon in the US. It is not the first time that the two Amazon's fail to connect. Many thanks for the tip,

Re: GB18030 and super font

2004-04-22 Thread Frank Yung-Fong Tang
In case you want to test your GB18030 font, you can use Netscape 7 (or lateset Mozilla) and then visit my GB18030 test pages at http://people.netscape.com/ftang/testscript/gb18030/gb18030.cgi?page=10 It should be page to page compatable to the paper copy of GB18030-2000 standard. I also

Re: GB18030 and super font

2004-04-22 Thread Eric Muller
Raymond Mercier wrote: Mark Shoulson writes their Super Font is bundled with Microsoft Office XP, and even Microsoft's prices haven't gotten that high! >From Microsoft, http://www.microsoft.com/globaldev/DrIntl/columns/015/default.mspx : "A font that contains

commandline converter for gb18030 - utf8 in *nix

2004-03-05 Thread Zhang Weiwu
Hello. I believe this must be a frequent question, but I googled around and I didn't find a satisfying tool. It seems most converters do GB2312 but not GB18030. I have 100+ files to convert, normal graphical /web based converters won't do the work well. On my FreeBSD there is a ported tool

Re: commandline converter for gb18030 - utf8 in *nix

2004-03-05 Thread Peter Jacobi
Hello. I believe this must be a frequent question, but I googled around and I didn't find a satisfying tool. It seems most converters do GB2312 but not GB18030. Both GNU libc iconv and GNU libiconv support GB18030. I assume the libiconv distribution includes the command line utility. Regards

Re: commandline converter for gb18030 - utf8 in *nix

2004-03-05 Thread Zhang Weiwu
Peter Jacobi wrote: Hello. I believe this must be a frequent question, but I googled around and I didn't find a satisfying tool. It seems most converters do GB2312 but not GB18030. Both GNU libc iconv and GNU libiconv support GB18030. I assume the libiconv distribution includes the command

Re: commandline converter for gb18030 - utf8 in *nix

2004-03-05 Thread Frank Yung-Fong Tang
you can also use 'nsconv' which come with mozilla source code with GB18030. see http://www.mozilla.org/projects/l10n/mlp_tools.html for details Zhang Weiwu wrote on 3/5/2004, 6:43 AM: Hello. I believe this must be a frequent question, but I googled around and I didn't find a satisfying tool

GB18030 mapping table....

2003-08-19 Thread Addison Phillips [wM]
Hi Will, The ICU library is a good source for information like this. See: http://oss.software.ibm.com/icu/charset/ The data table is located here: http://oss.software.ibm.com/cvs/icu/charset/data/xml/gb-18030-2000.xml Read the note on the first page. There are official sources as well, but I

RE: UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030

2002-11-15 Thread Carl W. Brown
Doug, However, 16 bit characters were a hard enough sell in the good old days. If we had started out withug 2bit characters we would still be dreaming about Unicode. I think Carl meant with 32-bit characters. I don't know what kind of word withug is (Old English?), but I like it. It

Re: IBM AIX 5 and GB18030

2002-11-15 Thread Markus Scherer
Michael Yau wrote: Markus, The standard does _not_ require to _process_ internally in GB18030. It is sufficient to have a converter and to process in Unicode, which does contain all of the characters. Just curious, do you have this in writing from the China standards body? I don't

Re: IBM AIX 5 and GB18030

2002-11-15 Thread Markus Scherer
Jane, you are right, I over-simplified. I tried to make the point that you need not _process_ text in GB18030 but that Unicode processing and conversion to/from GB18030 fulfills the requirement to be able to read and write GB18030 text. Yes, you need to have font support for all the characters

RE: UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030

2002-11-15 Thread John McConnell
that I shouldn't care. John Microsoft -Original Message- From: Doug Ewell [mailto:dewell;adelphia.net] Sent: Thursday, November 14, 2002 8:26 PM To: Unicode Mailing List Cc: Carl W. Brown Subject: Re: UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030 Carl W. Brown cbrown at xnetinc dot com

Re: IBM AIX 5 and GB18030

2002-11-14 Thread Jane Liu
Thanks Mark ! That may mean IBM AIX 5 support converison between GB18030 and Unicode, but I don't see this is a system level of support because there is no locale names for GB18030 in the doc of AIX 5 : http://publibn.boulder.ibm.com/doc_link/en_US/a_doc_lib/aixbman/admnconc/locale.htm Zh_CN

RE: IBM AIX 5 and GB18030

2002-11-14 Thread Carl W. Brown
Jane, One of the problems is that early Unicode adopters used the 16 bit UCS-2 encoding for of Unicode. Converting to UTF-16 requires surrogate support. Some of the GB18030 characters require this support. ICU is dedicated to Unicode support so a lot of effort is put into ICU to keep it up

Re: IBM AIX 5 and GB18030

2002-11-14 Thread Markus Scherer
string handling assume that the single-code-point type is the same as the string base unit. This one design point requires 32-bit wchar_t not just for Unicode but also for the character sets of EUC-TW and GB18030. You seem to suggest that there is a problem with 16-bit Unicode. It does take some

Re: IBM AIX 5 and GB18030

2002-11-14 Thread Markus Scherer
Jane Liu wrote: That may mean IBM AIX 5 support converison between GB18030 and Unicode, but I don't see this is a system level of support because there is no locale names for GB18030 in the doc of AIX 5 : The GB 18030 standard requires software to be able to _read and write_ text in the GB18030

Re: IBM AIX 5 and GB18030

2002-11-14 Thread Michael \(michka\) Kaplan
From: Carl W. Brown [EMAIL PROTECTED] Other companies like Microsoft took a very big gamble and implemented the code for surrogate support into Windows 2000 based on early drafts of the Unicode standard. If they had not done it this way or had guessed wrong they might not even have support

Re: IBM AIX 5 and GB18030

2002-11-14 Thread Michael Yau
Markus, The standard does _not_ require to _process_ internally in GB18030. It is sufficient to have a converter and to process in Unicode, which does contain all of the characters. Just curious, do you have this in writing from the China standards body? - Michael Markus Scherer wrote

Re: IBM AIX 5 and GB18030

2002-11-14 Thread Jane Liu
Mark, I think only converter is not sufficient. How about the following support : - IME (to input CJK Ext.A characters through GB18030/Unicode code) - X-Windows fonts support. - iconv support - mbtowc(), mbstowcs(), mblen()... - and so on... You need be able to do like what you can do on Solaris

Re: IBM AIX 5 and GB18030

2002-11-14 Thread Joe Ross
To:Markus Scherer [EMAIL PROTECTED], unicode [EMAIL PROTECTED] cc: Subject:Re: IBM AIX 5 and GB18030 Thanks Mark ! That may mean IBM AIX 5 support converison between GB18030 and Unicode, but I don't see this is a system level of support because there is no locale names

RE: IBM AIX 5 and GB18030

2002-11-14 Thread Carl W. Brown
] [mailto:unicode-bounce;unicode.org]On Behalf Of Markus Scherer Sent: Thursday, November 14, 2002 9:18 AM To: unicode Subject: Re: IBM AIX 5 and GB18030 Carl W. Brown wrote: Some Unix systems adapted faster because the later Unicode adopters used 32 bit Unicode characters making the job

UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030

2002-11-14 Thread Carl W. Brown
Markus, You seem to suggest that there is a problem with 16-bit Unicode. It does take some effort to adapt UCS-2-designed functions for UTF-16, but it's not rocket science and works very well thanks to the Unicode allocation practice (common characters in the BMP). Making UTF-8/32 functions

Re: UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030

2002-11-14 Thread Doug Ewell
Carl W. Brown cbrown at xnetinc dot com wrote: Converting from UCS-2 to UTF-16 is just like converting from SBCS to DBCS. For folks who think DBCS it is no problem. Those who went from DBCS to Unicode to simplify their lives I am sure are not happy. Ken made me laugh last March by referring

IBM AIX 5 and GB18030

2002-11-13 Thread xjliu_ca
Dear I18N experts, I have searched all the web on IBM about the support of GB18030 in OS AIX 4.3 and 5, but didn't find anything. I only can see they support GB2312 and GBK. I know IBM was one of the pioneer to support GB18030, i.e. their ICU. But it doesn't make sense their AIX doesn't

Re: IBM AIX 5 and GB18030

2002-11-13 Thread Markus Scherer
xjliu_ca wrote: I have searched all the web on IBM about the support of GB18030 in OS AIX 4.3 and 5, but didn't find anything. I only can see they support GB2312 and GBK. Google found something for me: http://www-3.ibm.com/software/ts/mqseries/support/readme/aix530_read.html Search for 18030

is GB18030 a combination of CJK and CJK extension?

2002-07-19 Thread Zhang Weiwu
I cannot find GB18030 stardard in local library, neither can I find it anywhere on the Internet. I wish to know the stardard itself. GB18030 contains about 27000 characters. CJK contains about 21000 characters and CJK Extension A 6000 characters. (i don't remeber the actual number.) It seems

Re: is GB18030 a combination of CJK and CJK extension?

2002-07-19 Thread James Kass
Sorry, second post, this looks like the standard can be downloaded now from on-line once you are a registered member of this site: (all-on-one-line:) http://www.sun.com/developers/gadc/technicalpublications/articles/gb18030.html Best regards, James Kass. - Original Message - From

Re: GB18030

2001-09-27 Thread Tom Emerson
GB 18030 is aligned to ISO 10646, which does not define the semantic properties that Unicode does. -- Tom Emerson Basis Technology Corp. Sr. Sinostringologist http://www.basistech.com Beware the lollipop of mediocrity: lick

Re: GB18030

2001-09-27 Thread Yung-Fong Tang
Sure I know it could (and will ) be implement by a mapping table. But you still need to know what U+4ff3a to define such mapping table, right ? and the mapping table will still be part of the software package, right ? And the user still won't get your new version of mapping table untill they

Re: GB18030

2001-09-27 Thread Kenneth Whistler
Frank, You don't need to explain to me the concept of GB18030. The question I have is about details mapping information. Now, now, there's no need to get snippy with me. It sounded like you were unclear from the kinds of questions you were asking. I look at http://oss.software.ibm.com/cvs

Re: GB18030

2001-09-27 Thread Michael \(michka\) Kaplan
From: Yung-Fong Tang [EMAIL PROTECTED] Can anyone tell me where can I find a online version of the GB18030 standard (yes, I want the STANDARD itself. Not someone's paper talk about the standard) . Or anyone could tell me where to get a copy of the standard. You mean the original Chinese

Re: GB18030

2001-09-27 Thread Yung-Fong Tang
you do that. > In particular, DOES GB18030 define code point to > code point mapping (beyond BMP) between Unicode? Unless you can said that is YES and show me the specification how to map between > them, there are no way people can implement code set conversion between GB18030 and Unico

Re: GB18030

2001-09-27 Thread Yung-Fong Tang
Kenneth Whistler wrote: Frank, > You don't need to explain to me > the concept of GB18030. The question I have is about details mapping > information. Now, now, there's no need to get snippy with me. It sounded like you were unclear from the kinds of questions you were asking. Sorry fo

Re: GB18030

2001-09-27 Thread David Starner
itself and asking help to get one. Do you have the access to the specification and DOES it specify so? Do you not have access to the web? It took me 4 minutes to find the information on the web. Start with www.google.com and type in GB18030, and you'll find most of the information right

Re: GB18030

2001-09-27 Thread David Starner
On Thu, Sep 27, 2001 at 12:27:11PM -0700, Yung-Fong Tang wrote: looks like I beat ICU by checkin my mapping table at April 9 (to mozilla) , 10 days before they check in their first version of GB18030 xml mapping table :) I probably can still claim the first open source project which support

Re: GB18030

2001-09-27 Thread Yung-Fong Tang
http://bugzilla.mozilla.org/show_bug.cgi?id=101998 I also submit a patch there (see the bug report). Unfortunately , I don't have time to test it yet. It will be nice if someone can code review that change for me. Sun folks, do you care about GB18030 to surrogate conversion in mozilla ? Please help

Re: GB18030

2001-09-27 Thread Markus Scherer
Yung-Fong Tang wrote: ... But you still need to know what U+4ff3a to define such mapping table, right? Wrong. You just need to know the mapping between code points, whether assigned, used, or whatever. ... So, whatever the software the user currently have today, without an upgrade (either

Re: GB18030

2001-09-27 Thread Yung-Fong Tang
? It took me 4 minutes to find the information on the web. Start with www.google.com and type in GB18030, and you'll find most of the information right there. Others have pointed out more specific links. No, I am NOT asking the information about ths GB18030 standard. I am asking the GB18030 standard

Re: GB18030

2001-09-27 Thread Yung-Fong Tang
ok... you beat me :) David Starner wrote: On Thu, Sep 27, 2001 at 12:27:11PM -0700, Yung-Fong Tang wrote: looks like I beat ICU by checkin my mapping table at April 9 (to mozilla) , 10 days before they check in their first version of GB18030 xml mapping table :) I probably can still

Re: GB18030

2001-09-27 Thread Michael \(michka\) Kaplan
From: Yung-Fong Tang Case mapping ? You have no way to generate mapping table for case mapping with knowing the character unless you already define those character have no case or only one case. Um, Unicode defines a behavior and even properties for unassigned code points. If you choose not

Re: GB18030

2001-09-27 Thread David Starner
On Thu, Sep 27, 2001 at 03:03:22PM -0700, Yung-Fong Tang wrote: David Starner wrote: If you can't recognize the character, then just don't convert it. It could be the quality of other's software, we have higher standard however. Higher standard? If I'm working on Old High German on a

Re: GB18030

2001-09-26 Thread Yung-Fong Tang
how can you implement tolower(U+4ff3a) without knowing what U+4ff3a is ? [EMAIL PROTECTED] wrote: In a message dated 2001-09-24 20:50:25 Pacific Daylight Time, [EMAIL PROTECTED] writes: Does GB18030 DEFINED the mapping between GB18030 and the rest of 11 planes? I don't think so, since

Re: GB18030

2001-09-26 Thread Yung-Fong Tang
Do you know where I can get the mapping table between GB18030 and Planes 1 to 16? I can only get the mapping between Plane 0 and GB18030. Tom Emerson wrote: Yung-Fong Tang writes: Does GB18030 DEFINED the mapping between GB18030 and the rest of 11 planes? I don't think so, since Unicode

Re: GB18030

2001-09-26 Thread Yung-Fong Tang
David Starner wrote: On Mon, Sep 24, 2001 at 06:18:19PM -0700, Yung-Fong Tang wrote: Markus Scherer wrote: Correction: to encode _all_ of Unicode, not just all Unicode BMP - GB 18030 covers all 17 planes, not just the BMP. Does GB18030 DEFINED the mapping between GB18030

Re: GB18030

2001-09-26 Thread Geoffrey Waigh
On Wed, 26 Sep 2001, Yung-Fong Tang wrote: how can you implement tolower(U+4ff3a) without knowing what U+4ff3a is ? With a data table. One set of debugged code that handles surrogates, composing characters, bidirectionality etc. coupled with a datafile that gets upgraded with each release of

Re: GB18030

2001-09-26 Thread Kenneth Whistler
with the characters, and not the encoded characters per se. (And this is a disease that was inflicted on the world 23 years ago when Kernighan and Ritchie published a certain language that unfortunately chose to call its 8-bit numeric data type a char.) In particular, DOES GB18030 define code point to code

Re: GB18030

2001-09-26 Thread Michael \(michka\) Kaplan
From: Geoffrey Waigh [EMAIL PROTECTED] It shouldn't require honest-to-goodness we-were't-kidding see-here's-one-defined-now characters In many cases, it did. for developers to slap themselves on the head They did -- and they are slapping others around them, too. and start developing

Re: GB18030

2001-09-26 Thread David Starner
if you don't have to (C10). GB18030, if it claims to support Unicode, needs to round-trip both characters. -- David Starner - [EMAIL PROTECTED] Pointless website: http://dvdeug.dhis.org When the aliens come, when the deathrays hum, when the bombers bomb, we'll still be freakin' friends. - Freakin

Re: GB18030

2001-09-26 Thread David Starner
that for BMP characters? There's a whole lot you can do without knowing the identity of a character. You can draw the glyph from a font, which will suffice for a lot of purposes. In particular, DOES GB18030 define code point to code point mapping (beyond BMP) between Unicode? Unless you can said

Re: GB18030

2001-09-24 Thread Markus Scherer
Yung-Fong Tang wrote: bascillay GB18030 is design to encode All Unicode BMP in a encoding which is backward compatable with GB2312 and GBK. Correction: to encode _all_ of Unicode, not just all Unicode BMP - GB 18030 covers all 17 planes, not just the BMP. markus

Re: GB18030

2001-09-24 Thread Yung-Fong Tang
Markus Scherer wrote: Yung-Fong Tang wrote: bascillay GB18030 is design to encode All Unicode BMP in a encoding which is backward compatable with GB2312 and GBK. Correction: to encode _all_ of Unicode, not just all Unicode BMP - GB 18030 covers all 17 planes, not just the BMP. Does

Re: GB18030

2001-09-24 Thread David Starner
On Mon, Sep 24, 2001 at 06:18:19PM -0700, Yung-Fong Tang wrote: Markus Scherer wrote: Correction: to encode _all_ of Unicode, not just all Unicode BMP - GB 18030 covers all 17 planes, not just the BMP. Does GB18030 DEFINED the mapping between GB18030 and the rest of 11 planes? I don't

Re: GB18030

2001-09-24 Thread Tom Emerson
Yung-Fong Tang writes: Does GB18030 DEFINED the mapping between GB18030 and the rest of 11 planes? I don't think so, since Unicode have not define them yet, right ? Sure it does. We know what the code points are, even if they don't have characters assigned to them yet. This allows GB18030

Re: GB18030

2001-09-24 Thread DougEwell2
In a message dated 2001-09-24 20:50:25 Pacific Daylight Time, [EMAIL PROTECTED] writes: Does GB18030 DEFINED the mapping between GB18030 and the rest of 11 planes? I don't think so, since Unicode have not define them yet, right ? Unicode defined all the planes, a long long time ago. It's

GB18030

2001-09-21 Thread Charlie Jolly
GB18030 In what ways will this effect Unicode? Does it contain anything that Unicode doesn't?

Re: GB18030

2001-09-21 Thread Thierry Sourbier
r question on the relationship between GB18030 and Unicode. Cheers, Thierry. www.i18ngurus.com - Open Internationalization Resources Directory

RE: GB18030

2001-09-21 Thread Sampo Syreeni
On Fri, 21 Sep 2001, Carl W. Brown wrote: Most systems that handle GB18030 will want to convert it to Unicode first to reduce processing overhead. Unless we start seeing Chinese software which is designed to utilize the compatibility between 18030 and GBK -- font rendering apps

RE: GB18030

2001-09-21 Thread Murray Sargent
I think I've figured out a way to find the beginning of a GB18030 character starting anywhere in a document. The algorithm is similar to finding the beginning of a DBCS character in that you scan backward until you find a byte that can only come at the start of a character. The main difference

Re: GB18030

2001-09-21 Thread Yung-Fong Tang
bascillay GB18030 is design to encode All Unicode BMP in a encoding which is backward compatable with GB2312 and GBK. The birth of GB18030 is because those characters which are encoded unicode but not encoded in GB2312 neither GBK. Thierry Sourbier wrote: Charlie, In what ways