Quiz for Unicode guru

2004-08-19 Thread Frank Yung-Fong Tang
OK, just for fun Quiz for Unicode Guru Here is the quiz for the Unicoder. It is not a hard quiz. Everyone will get it right eventually. So, use stop watch to measure how long it will take for you figure out the right answer. Note: You can find the information of Unicode and UTF-8 from

problems in Public Review 33 UTF Conversion Code Update

2004-05-19 Thread Frank Yung-Fong Tang
Looking at http://www.unicode.org/review/ 33 UTF Conversion Code Update 2004.06.08 The C language source code example for UTF conversions (ConverUTF.c) has been updated to version 1.2 and is being released for public review and comment. This update

Yet another reason some software treat your UTF-8 xml as US-ASCII

2004-05-06 Thread Frank Yung-Fong Tang
For sure no one in this mailling list want to see your xml got treated as US-ASCII when the data is really in UTF-8. If I have an xml file like the following ?xml version="1.0"? and send over the HTTP protocol with the following content type header: Content-Type: text/xml; (without

OT: Standardize TimeZone ID

2004-04-23 Thread Frank Yung-Fong Tang
Is there any standard effort try to standardize Time Zone ID? I am not talking about the Time Zone which refer to a particular time (that could be done by GMT offset or addressed by ISO 8601) itself, but rather talking about an id refer to a particular time zone/ day light saving time rule.

unicode site problem

2004-04-22 Thread Frank Yung-Fong Tang
any one know who can fix http://www.unicode.org/reports/index.html ? all the links are broken

Re: GB18030 and super font

2004-04-22 Thread Frank Yung-Fong Tang
Raymond Mercier wrote on 4/22/2004, 7:35 AM: I enquired about the 'super font' created by a Beijing foundry, http://font.founder.com.cn/english/web/index.htm, and am fairly astonished at the prices, as you see from the attached. The cost of produce these fonts are much higher than

Unicode 4.0 and ISO10646-2003

2004-04-22 Thread Frank Yung-Fong Tang
I saw the announcment of publishing " ISO/IEC 10646: 2003, Information technology -- Universal Multiple-Octet Coded Character Set (UCS)" >From http://anubis.dkuug.dk/jtc1/sc2/open/02n3729.htm I expect there are no difference from Unicode 4.0, am I right?

Re: GB18030 and super font

2004-04-22 Thread Frank Yung-Fong Tang
In case you want to test your GB18030 font, you can use Netscape 7 (or lateset Mozilla) and then visit my GB18030 test pages at http://people.netscape.com/ftang/testscript/gb18030/gb18030.cgi?page=10 It should be page to page compatable to the paper copy of GB18030-2000 standard. I also

Re: Unicode 4.0 and ISO10646-2003

2004-04-22 Thread Frank Yung-Fong Tang
Kenneth Whistler wrote on 4/22/2004, 3:26 PM: Frank asked: I expect there are no difference from Unicode 4.0, am I right? Correct. Please see Appendix C of Unicode 4.0, p. 1348 and p. 1350, which already explicitly makes this statement. --Ken I don't see ISO10646-2003 in the

Re: help finding radical/stroke index at unicode.org

2004-04-14 Thread Frank Yung-Fong Tang
are you talking about http://www.unicode.org/charts/unihangridindex.html and http://www.unicode.org/charts/unihanrsindex.html ? Gary P. Grosso wrote on 4/14/2004, 1:18 PM: Hi, I am looking for an up-to-date, online version of the sort of thing I see in the back of the printed Unicode

Re: Novice question

2004-03-23 Thread Frank Yung-Fong Tang
Be careful here, for Unicode support in the browser (at least Netscape/Mozilla) there are some code fork between 2000/XP and Win98/ME. Philippe Verdy wrote on 3/23/2004, 5:39 AM: From: Edward H. Trager [EMAIL PROTECTED] Also, I would not bother testing Windows OSes prior to Windows

Re: in the NEW YORK TIMES today, report of a USA patent for a met hod to make the Arabic language easier to read/write/typeset

2004-03-16 Thread Frank Yung-Fong Tang
Chris Jacobs wrote on 3/15/2004, 10:08 PM: - Original Message - From: Kenneth Whistler [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Tuesday, March 16, 2004 2:28 AM Subject: Re: in the NEW YORK TIMES today, report of a USA patent for a met hod to

Re: in the NEW YORK TIMES today, report of a USA patent for a met hod to make the Arabic language easier to read/write/typeset

2004-03-16 Thread Frank Yung-Fong Tang
May be I should file an US patent application to write Arabic from left to right to make it more simplified :) I guess that will have more adoption rate compare to this font design patent since most software which does not support Bidi already implement them. :) Mark E. Shoulson wrote on

Re: in the NEW YORK TIMES today, report of a USA patent for a method to make the Arabic language easier to read/write/typeset

2004-03-15 Thread Frank Yung-Fong Tang
Wow. It seems not a very new idea. Similar idea have been used in Chinese 40 years ago and create the differences between Simplifed Chinese And Traditional Chinese. Michael Everson wrote on 3/15/2004, 12:40 PM: In the NEW YORK TIMES today comes a report of a USA patent for a new version of

Re: multibyte char display

2004-03-15 Thread Frank Yung-Fong Tang
many different reason you will see ? there. read my paper http://people.netscape.com/ftang/paper/unicode25/a302.htm to see a list. Manga wrote on 3/15/2004, 10:07 AM: I use UTF-8 encoding in java code to store multi byte characters in the db . When i retreive the multi byte characters

RE: in the NEW YORK TIMES today, report of a USA patent for a met hod to make the Arabic language easier to read/write/typeset

2004-03-15 Thread Frank Yung-Fong Tang
Mike Ayers wrote on 3/15/2004, 2:50 PM: From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Frank Yung-Fong Tang Sent: Monday, March 15, 2004 11:16 AM It seems not a very new idea. Similar idea have been used in Chinese 40 years ago

Re: Version(s) of Unicode supported by various versions of Microsoft Windows

2004-03-05 Thread Frank Yung-Fong Tang
Not sure how to find the information paper. But one way to check the degree of the support is to do a GetStringTypeEx agasinst some characters defined in 2.0, 2.1, 3.0, 3.1, 3.2, 4.0 to see does those return result reflect what it should be. Antoine Leca wrote on 3/5/2004, 8:35 AM: Hi

Re: commandline converter for gb18030 - utf8 in *nix

2004-03-05 Thread Frank Yung-Fong Tang
you can also use 'nsconv' which come with mozilla source code with GB18030. see http://www.mozilla.org/projects/l10n/mlp_tools.html for details Zhang Weiwu wrote on 3/5/2004, 6:43 AM: Hello. I believe this must be a frequent question, but I googled around and I didn't find a satisfying

Re: Font Technology Standards

2004-03-03 Thread Frank Yung-Fong Tang
BDF is also widly used, although the quality and features is not that powerful these day. Also, there are other "standard" about the font: 1. Glyph set "standard"- how to make sure one font contains all the glyph for a particular group of users- for example- WGL4 is a glyph set standard from

Re: What's in a wchar_t string on unix?

2004-03-03 Thread Frank Yung-Fong Tang
oh. This is the first time I hear about this. Thanks about your information. Does it also mean wchar_t is 4 bytes if __STDC_ISO_10646__ is defined? or does it only mean wchar_t hold the character in ISO_10646 (which mean it could be 2 bytes, 4 bytes or more than that?) Noah Levitt wrote on

Re: What's in a wchar_t string on unix?

2004-03-03 Thread Frank Yung-Fong Tang
not prevent someone to make it 16 bits or 64 bits when that macro is defined, right? And what does the year and month mean? On Mar 03, 2004, at 12:38, Frank Yung-Fong Tang wrote: oh. This is the first time I hear about this. Thanks about your information. Does it also mean wchar_t is 4

Re: What's in a wchar_t string on unix?

2004-03-03 Thread Frank Yung-Fong Tang
Clark Cox wrote on 3/3/2004, 4:33 PM: [I swap the reply order to make my new question clearer] And what does the year and month mean? It indicates which version of ISO10646 is used by the implementation. In the above example, it indicates whatever version was in effect in December

Re: What's in a wchar_t string on unix?

2004-03-01 Thread Frank Yung-Fong Tang
I Rick Cameron wrote on 3/1/2004, 2:13 PM: Hi, all This may be an FAQ, but I couldn't find the answer on unicode.org. The reason is there are "NO answer" to the question you ask. It seems that most flavours of unix define wchar_t to be 4 bytes. Depend on which UNIX

Re: unicode format

2004-02-23 Thread Frank Yung-Fong Tang
John Cowan wrote: steve scripsit: Could someone please clarify the difference between UTF8 and UFT16 please? If it is possible to encode everything in UTF8 and it is more efficient what is the need for UTF16? It is more efficient to PROCESS in UTF16.

RE: Mother Language Day

2004-02-23 Thread Frank Yung-Fong Tang
joe wrote: (Hmm, in Russian mother language (maternij jazik) means something *verry* different. Watch your language! ;-) He write this in English not Russian, right? How can I watch Chinese (my language) ? Joe

Re: Codes for Individual Chinese Brushstrokes

2004-02-20 Thread Frank Yung-Fong Tang
As a native Chinese person. I believe 1. The so called eight basic stroke is very standard in concept. But that is only 8. 2. They list 8 different varients for each of the 8 basic stroke. But if you read that page carefully, it does not mean that there are only 8 variants for each stroke,

Re: UTF-8 to UTF-16 conversion

2004-02-06 Thread Frank Yung-Fong Tang
Yes, TEC. look at developer.apple.com and look at Text Encoding Converter Paramdeep Ahuja wrote: Hi Can anyone tell if there is any API available on MAC to convert from UTF-8 to UTF-16 thnx -P

Re: Detecting encoding in Plain text

2004-01-14 Thread Frank Yung-Fong Tang
Consider CR and LF too. Mark Davis wrote on 1/14/2004, 9:25 AM: I'm not sure which one suggested heuristic method you are referring to, but you are bounding to conclusions. For example, one of the heuristics is to judge what are more common characters when bytes are interpreted as if

Re: Detecting encoding in Plain text

2004-01-14 Thread Frank Yung-Fong Tang
Does Thai use CR and LF? Peter Kirk wrote on 1/14/2004, 8:12 AM: On 14/01/2004 07:16, John Burger wrote: ... By the way, I still don't quite understand what's special about Thai. Could someone elaborate? I mentioned Thai because it is the only language I know of which does

Re: Detecting encoding in Plain text

2004-01-14 Thread Frank Yung-Fong Tang
John Burger wrote on 1/14/2004, 7:16 AM: Mark E. Shoulson wrote: If it's a heuristic we're after, then why split hairs and try to make all the rules ourselves? Get a big ol' mess of training data in as many languages as you can and hand it over to a class full of CS graduate

Re: Programmatic description of ideographic characters

2004-01-03 Thread Frank Yung-Fong Tang
looks like an old idea people in Taiwan gave up long time ago because of the issue of the quality of glyph will never be good enough. Tom Emerson wrote on 1/2/2004, 6:06 PM: The following paper, Chinese Character Synthesis using METAPOST, was recently mentioned in a thread on the teTeX

Re: MS Windows and Unicode 4.0 ?

2003-12-03 Thread Frank Yung-Fong Tang
come on, take my joke. but that is a perfect example of language specific variant glyph, right? Michael Everson wrote: At 17:13 -0800 2003-12-02, Frank Yung-Fong Tang wrote: come on, use language specific glyph substution on the last resort font to show Irish last resort glyph

Re: MS Windows and Unicode 4.0 ?

2003-12-03 Thread Frank Yung-Fong Tang
Peter Kirk wrote: On 02/12/2003 16:25, Frank Yung-Fong Tang wrote: ... a barrier to proper internationalisation ? My opinion is reverse, I think it is a strategy to proper internationalization. Remember, people can always choose to stay with ISO-8859-1 only or go to UTF-8

Re: MS Windows and Unicode 4.0 ?

2003-12-03 Thread Frank Yung-Fong Tang
, it will be 1% of efforts for me to fix it later, right? :) Michael Everson wrote: At 15:38 -0800 2003-12-03, Frank Yung-Fong Tang wrote: I am encouraging QA to test MES-1 with UTF-8 instead of only ISO-8859-1. I am encouraging product ship with MES-1 support out of the box instead

RE: MS Windows and Unicode 4.0 ?

2003-12-02 Thread Frank Yung-Fong Tang
than 10 scripts ? I think the value is it show poeple it is not a ? ASCII question mark itself. -- -- Frank Yung-Fong Tang tm rhtt, Itrntinl Dvlpmet, AOL Intrtv Srvies AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913 Yahoo! Msg: frankyungfongtan

RE: MS Windows and Unicode 4.0 ?

2003-12-02 Thread Frank Yung-Fong Tang
Subject: Re: MS Windows and Unicode 4.0 ? I'm interested in knowing whether the following features would soon be found in Windows : fonts for scripts covered by Unicode 4.0, corresponding rendering engine to display all Unicode 4.0 scripts -- -- Frank Yung-Fong Tang tm rhtt, Itrntin

Re: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)

2003-12-02 Thread Frank Yung-Fong Tang
-8 gzip of SCSU gzip of BOCU-1 gzip of Legacy encoding -- -- Frank Yung-Fong Tang tm rhtt, Itrntinl Dvlpmet, AOL Intrtv Srvies AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913 Yahoo! Msg: frankyungfongtan

Re: How can I have OTF for MacOS

2003-12-02 Thread Frank Yung-Fong Tang
John Jenkins wrote: On Dec 1, 2003, at 4:24 PM, Frank Yung-Fong Tang wrote: John What 'cmap' format Apple use in the MacOS X Devanagari and Bangla fonts? The formats are irrelevant; the Mac supports all the 'cmap' subtable formats for all subtables. For rendering complex

RE: MS Windows and Unicode 4.0 ?

2003-12-02 Thread Frank Yung-Fong Tang
Michael Everson wrote: At 14:23 -0800 2003-12-02, Frank Yung-Fong Tang wrote: It's better than not knowing what range the thing is in. It helps the user know he has received, say, Telugu data or whatever. Only if the user know what Telugu may look like. How many users other

Re: UTF-16 inside UTF-8

2003-12-02 Thread Frank Yung-Fong Tang
Doug Ewell wrote: Frank Yung-Fong Tang ytang0648 at aol dot com wrote: Then, Frank, the Tcl implementation is *not valid UTF-8* and needs to be fixed. Plain and simple. If a system like Tcl only supports the BMP, that is its choice, but it *must not* accept non-shortest UTF-8 forms

Re: MS Windows and Unicode 4.0 ?

2003-12-02 Thread Frank Yung-Fong Tang
Peter Kirk wrote: On 02/12/2003 14:19, Frank Yung-Fong Tang wrote: A better approach than asking Does product X support Unicode 4.0 which in some way you can always get a NO answer is to 1. Define a smaller set of functionality (Such as MES-1, MES-2, MES-3A) 2. Ask 'Does

Re: MS Windows and Unicode 4.0 ?

2003-12-02 Thread Frank Yung-Fong Tang
://homepage..mac.com/jhjenkins/ -- -- Frank Yung-Fong Tang tm rhtt, Itrntinl Dvlpmet, AOL Intrtv Srvies AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913 Yahoo! Msg: frankyungfongtan

RE: UTF-16 inside UTF-8

2003-12-02 Thread Frank Yung-Fong Tang
Philippe Verdy wrote: Frank Yung-Fong Tang writes: But how about the UTF-16 vs UCS4 battle? Forget it: nearly nobody uses UCS-4 except very internally for string processing at the character level. For whole strings, nearly everybody uses UTF-16 as it performs better with less

Re: creating a test font w/ CJKV Extension B characters.

2003-12-01 Thread Frank Yung-Fong Tang
NT\CurrentVersion\LanguagePack] SURROGATE=dword:0002 [HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\International\Scripts\42] IEFixedFontName=Code2001 IEPropFontName=Code2001 /code Andrew -- -- Frank Yung-Fong Tang tm rhtt, Itrntinl Dvlpmet, AOL Intrtv Srvies

Re: How can I have OTF for MacOS

2003-12-01 Thread Frank Yung-Fong Tang
rendering, it cannot support them. John H. Jenkins John What 'cmap' format Apple use in the MacOS X Devanagari and Bangla fonts? -- -- Frank Yung-Fong Tang tm rhtt, Itrntinl Dvlpmet, AOL Intrtv Srvies AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913 Yahoo! Msg: frankyungfongtan

RE: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Frank Yung-Fong Tang
should also compare the same for things like keyword searches and file systems even though it is technically incorrect. Carl -- -- Frank Yung-Fong Tang tm rhtt, Itrntinl Dvlpmet, AOL Intrtv Srvies AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913 Yahoo! Msg: frankyungfongtan

Re: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Frank Yung-Fong Tang
the questioning party is thinking must be given as a part of said question. oh... really, what kind of Unicode support in Windows 2.0? (since you said- *any*)... No... I don't really care. Don't try to answer me. -- -- Frank Yung-Fong Tang tm rhtt, Itrntinl Dvlpmet, AOL Intrtv Srvies AIM:yungfongta

Re: Request

2003-11-21 Thread Frank Yung-Fong Tang
with this weired specification - ISCII. (if you don't think it is weired, look at the E-1 Display Attributes session in Annex-E of ISCII which is worst than the E-2 Font Attributes I mentioned here.) -- -- Frank Yung-Fong Tang tm rhtt, Itrntinl Dvlpmet, AOL Intrtv Srvies AIM:yungfongta mailto

Re: creating a test font w/ CJKV Extension B characters.

2003-11-20 Thread Frank Yung-Fong Tang
: Frank Yung-Fong Tang wrote, If you visit http://people.netscape.com/ftang/testscript/gb18030/gb18030.cgi?page=596 and your machine have surrogate support install correctly and surrogate font install correctly then you should see surrogate characters show up match the gif

Re: creating a test font w/ CJKV Extension B characters.

2003-11-20 Thread Frank Yung-Fong Tang
Yung-Fong Tang tm rhtt, Itrntinl Dvlpmet, AOL Intrtv Srvies AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913 Yahoo! Msg: frankyungfongtan John 3:16 For God so loved the world that he gave his one and only Son, that whoever believes in him shall not perish but have eternal life. Does your

Re: creating a test font w/ CJKV Extension B characters.

2003-11-20 Thread Frank Yung-Fong Tang
Michael (michka) Kaplan wrote: From: Frank Yung-Fong Tang [EMAIL PROTECTED] so.. in summary, how is your concusion about the quality of GB18030 support on IE6/Win2K ? If you run the same test on Mozilla / Netscape 7.0, what is your conclusion about that quality of support

Re: UTF-16 inside UTF-8

2003-11-19 Thread Frank Yung-Fong Tang
. If you still think adding 4 bytes UTF-8 support is 1% of the task, then please join the Tcl project and help me fix that. I appreciate your efforts there and I beleive a lot of people will thank for your contribution. Doug Ewell wrote: Frank Yung-Fong Tang YTang0648 at aol dot com wrote

Re: Problems encoding the spanish o

2003-11-19 Thread Frank Yung-Fong Tang
. _ Charla con tus amigos en lnea mediante MSN Messenger. http://messenger.microsoft.com/es -- -- Frank Yung-Fong Tang tm rhtt, Itrntinl Dvlpmet, AOL Intrtv Srvies AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913 Yahoo! Msg: frankyungfongtan John 3:16 For God so loved the world

Re: What does i18n mean?

2003-11-19 Thread Frank Yung-Fong Tang
bandied about a lot. It is a short hand for "Irn " because it is too hard for most of the people to type the "r" part. :) [and if your software can save that string retrive it correct later, 50% of the i18n problem is addressed] -- Frank Yung-Fong Tang

Re: creating a test font w/ CJKV Extension B characters.

2003-11-19 Thread Frank Yung-Fong Tang
about fonts. Could someone recommend a good tutorial or 'font creator' application that addresses surrogate pairs? Thanks, Erik Ostermueller -- -- Frank Yung-Fong Tang tm rhtt, Itrntinl Dvlpmet, AOL Intrtv Srvies AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913 Yahoo! Msg

Re: creating a test font w/ CJKV Extension B characters.

2003-11-19 Thread Frank Yung-Fong Tang
are you using Netscape7 / Mozilla or IE? If you use IE, then IE may have a bug about that. I think Mozilla should not have the problem since I develope and test it by myself. [EMAIL PROTECTED] wrote: . Frank Yung-Fong Tang wrote, If you visit http://people.netscape.com/ftang

Re: creating a test font w/ CJKV Extension B characters.

2003-11-19 Thread Frank Yung-Fong Tang
Philippe Verdy wrote: From: Frank Yung-Fong Tang [EMAIL PROTECTED] It is not that easy for you from don't know beans about fonts to creat a test font that contains ... \u20050. If you are lucky, it will take you several month if not year. There are commercial base font tool

Re: creating a test font w/ CJKV Extension B characters.

2003-11-19 Thread Frank Yung-Fong Tang
# ftxinstalledfonts # ftxruler # ftxvalidator John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage..mac.com/jhjenkins/ -- -- Frank Yung-Fong Tang tm rhtt, Itrntinl Dvlpmet, AOL Intrtv Srvies AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913 Yahoo

Re: How can I input any Unicode character if I know its hexadecimal code?

2003-11-17 Thread Frank Yung-Fong Tang
hum a very stupid (but work) way. 1. use vi 2. type #x + the Unicode text + ; for each characters 3. save it as .html 4. open the file by using browser 5. copy the text 6. paste into your software. -- Frank Yung-Fong Tang tm rhtt, Itrntinl Dvlpmet, AOL Intrtv Srvies AIM:yungfongta mailto

Re: newbie 18030 font question

2003-04-03 Thread Yung-Fong Tang
We add GB18030 support into Mozilla and also add 32 bit cmap support on windows into Mozilla about a year ago. The Linux and Mac 32-bit cmap support is a little bit behind I think we first have GB18030 encoding support in Netscape in Netscape 6.2 You should be able to see whatever the

Re: Copy/paste in xterm/XEmacs

2003-04-03 Thread Yung-Fong Tang
I think that is depending on the application support the newly defined UTF8_STRING for selection or not. The Linux verion of mozilla implement it so it can copy/paste with the recent version of xterm w/o problem Notice that UTF8_STRING is defined AFTER X11 R6 ICCCM. See the spec in

Re: Problem in unix server with the encoding of pound(#163)

2003-03-21 Thread Yung-Fong Tang
Jain, Pankaj (MED, TCS) wrote: Hi, I am generating pound sign in html preview using XML XSLT transformation and its working fine in windows using #163; in XML but same thing is not working in unix server. What do you mean in unix server ? display the text on the Unix Xterm ? or you are

Re: Unicode Public Review Issues update

2003-03-18 Thread Yung-Fong Tang
url please Rick McGowan wrote: The Unicode Public Review Issues page has been updated today. Highlights: Closed issue #1 (Language tag deprecation) without any change. Updated some deadlines on other issues to June 1, 2003. Added a document for issue #7 (tailored normalizations).

Re: Characters that rotate in vertical text

2003-03-14 Thread Yung-Fong Tang
I think that is a hard problem First of all. Take a look at http://www.unicode.org/Public/4.0-Update/UCD-4.0.0d5b.html and find the vertical one Second, anything which need to be Symmetric Swap in Bidi probably need to be change in the vertical form. (If they need to be change in horizontal

Re: New document.

2003-03-14 Thread Yung-Fong Tang
Otto Stolz wrote: The two scans under http://www.rz.uni-konstanz.de/Antivirus/tests/li.png http://www.rz.uni-konstanz.de/Antivirus/tests/re.png are from the authoritative (until July 1996) book on German orthography: Duden Rechtschreibung der deutschen Sprache und der Fremdwörter / hrsg.

Re: pinyin syllable `rua'

2003-03-14 Thread Yung-Fong Tang
Which pinyin system the rua is in? I use simpchinese win XP and if I switch to Full Spell (??)Simplified Chinese IME and type rua', then I got (read this email in UTF-8) which is U+633C I am not sure that is correct. At least, as a native Mardarin speaker, that sound is not nature for me at

Re: sorting order between win98/xp

2003-03-13 Thread Yung-Fong Tang
Dominikus Scherkl wrote: Anyone know why the sort order is different under that two systems? As I mentioned: a new feature, keeping numbers ordered numerical. I won't mind if they ALSO give me a flag to control that behavior. Number could be used for many different

Re: sorting order between win98/xp

2003-03-13 Thread Yung-Fong Tang
Anyone know is there a way to make them sort in the same order? Why should anybody want that? Because user expect a cross platforms (or I should said cross windows version) product display the same sorting order in Win98 and on WinXP. For example, the Netscape7

Re: sorting order between win98/xp

2003-03-13 Thread Yung-Fong Tang
Michael (michka) Kaplan wrote: From: "Yung-Fong Tang" [EMAIL PROTECTED] One of my colleague ask me this question. Not much to do with Unicode, though. Is it? It will be an Unicode issue if the cause is the new software try to implement http://unicode.o

Re: sorting order between win98/xp

2003-03-13 Thread Yung-Fong Tang
We cannot use that. The function you mention is to compare two Unicode strings. We need the function to "generate sort key" from unicode strings instead of compare two string. Michael (michka) Kaplan wrote: From: "Yung-Fong Tang" [EMAIL PROTECTED] One of

Re: sorting order between win98/xp

2003-03-13 Thread Yung-Fong Tang
Doug got my point. What I care is the "difference" instead of which one is better. Doug Ewell wrote: Dominikus Scherkl Dominikus dot Scherkl at glueckkanja dot com wrote: It is not deterministic string ordering ?!? What's non-deterministic in numeric

Re: Unicode character transformation through XSLT

2003-03-13 Thread Yung-Fong Tang
I have not touch Java for years (probably 5 years) ... so, I could be wrong. Jain, Pankaj (MED, TCS) wrote: Hi ftang/james.. thanks for the details explanation. and now I the root problem of my error. I have following string is in database as Long in which

Re: farsi calendar components

2003-03-13 Thread Yung-Fong Tang
check http://emr.cs.iit.edu/home/reingold/calendar-book/second-edition/ Paul Hastings wrote: does anybody know of any java farsi calendar components? thanks. Paul Hastings [EMAIL PROTECTED] CTO Sustainable Development

Re: sorting order between win98/xp

2003-03-13 Thread Yung-Fong Tang
on the same data and return the same results. Your colleague is mistaken. MichKa - Original Message - From: "Yung-Fong Tang" [EMAIL PROTECTED] To: "Michael (michka) Kaplan" [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Thursday, March 13, 2003 4:31 PM Subject: Re: s

Re: wap and utf-8

2003-03-13 Thread Yung-Fong Tang
Mary McCarter wrote: Hi Friends, My phone (Motorola i550,i30sx,i85,i60c) doesn't show correctly the neither #243; and it shows the instead of . Is that a LATIN CAPITAL A WITH TILD and a SUPERSCRIPT THREE? ISO-8859-1 use 0xc3 to encode LATIN CAPITAL A WITH TILD ISO-8859-1 use 0xb3

Re: Encoding: Unicode Quarterly Newsletter

2003-03-11 Thread Yung-Fong Tang
Hope they can reduce the weight next time by change the type of the paper. My Bible is about 500 pages (about 1500+ pages) more than the unicode 3.0 standard but only 50% of it's thick. Same as my Chinese/English dictionary. Otto Stolz wrote: Kenneth Whistler wrote: we can calculate the

Re: Encoding: Unicode Quarterly Newsletter

2003-03-11 Thread Yung-Fong Tang
John H. Jenkins wrote: I certainly think it would be good published with a leather cover, onion-skin paper, and gilt edges, yes. First we have to have Ken divide it into verses, though. I thought we already have verses dividied in Chapter 3. Those C1-C13/D1-2 stuff

sorting order between win98/xp

2003-03-11 Thread Yung-Fong Tang
One of my colleague ask me this question. We use LCMapStringW on WinXP and LCMapStringA on Win98 (by using LCMAP_SORTKEY ). And we got different sorting order for the following Example of message list ordering in Win98: TESTING #1 TESTING #10 TESTING #100 TESTING #11 While, the message list

Re: Unicode character transformation through XSLT

2003-03-11 Thread Yung-Fong Tang
Because the following code got apply to your unicode data 1. convert \u to unicode - \uFFE2\uFF80\uFF93 become three unicode characters- U+FFE2, U+FF80, U+FF93 This is ok 2. a "Throw away hihg 8 bits got apply to your code" so it became 3 bytes E2 80 93 3. and some code treat it as UTF-8

pesonal comments about http://www.w3.org/TR/xml11/

2003-03-10 Thread Yung-Fong Tang
.10; Supplementary Private Use Area-B Also, I doubt we should allow E..E007F; Tags to be used as NameStartChar Frank Yung-Fong Tang

Re: length of text by different languages

2003-03-07 Thread Yung-Fong Tang
Ram Viswanadha wrote: There is also some information at http://oss.software.ibm.com/icu/docs/papers/binary_ordered_compression_for_unicode.html#Test_Results Not sure if this is what you are looking for. thanks. not really. I am not look into the

Re: Need program to convert UTF-8 - Hex sequences

2003-03-06 Thread Yung-Fong Tang
1. open you file with n7 and change the encoding to UTF-8 2. select and copy all the text 3. paste into the first textarea of the attached html file David Oftedal wrote: Hello! Sorry to make this a mass spam, but I need a program to convert UTF-8 to hex sequences. This is useful for embedding

Re: length of text by different languages

2003-03-06 Thread Yung-Fong Tang
Francois Yergeau wrote: [EMAIL PROTECTED] wrote: I remember there were some study to show although UTF-8 encode each Japanese/Chinese characters in 3 bytes, Japanese/Chinese usually use LESS characters in writting to communicate information than alphabetic base langauges.

Re: length of text by different languages

2003-03-06 Thread Yung-Fong Tang
Francois Yergeau wrote: http://www.unicode.org/iuc/iuc9/Friday2.html#b3 Reuters Compression Scheme for Unicode (RCSU) Misha Wolf Unfortunately, no information about Germany or Japanese. :( It only have Chinese, Frasi, Urdu, Russian, Arabic, Hindi, Korean , Creole, Thai, French, Czech,

Re: length of text by different languages

2003-03-06 Thread Yung-Fong Tang
thanks, everyone. But I want to point out the punct and itself should also be consider in your future caculation. Japanese and Chinese, Thai do not use between word, and Latin based (or Greek, Koeran,Cyrillic, Arabic, Armenian Georgian, etc) does use and when used for estimate size,

length of text by different languages

2003-03-05 Thread Yung-Fong Tang
I remember there were some study to show although UTF-8 encode each Japanese/Chinese characters in 3 bytes, Japanese/Chinese usually use LESS characters in writting to communicate information than alphabetic base langauges. Any one can point to me such research? Martin, do you have some paper

Re: Unicode Arabic Rendering Problem

2003-03-03 Thread Yung-Fong Tang
-Fong Tang [EMAIL PROTECTED] wrote:

Re: Unicode 4.0 BETA available for review

2003-02-28 Thread Yung-Fong Tang
Thanks to let me know. I guess I didn't spend enugh time with www.unicode.org these days :) when do you add those PDF there ? It used to have only partial sesssion available... but that is probably story several years ago Roozbeh Pournader wrote: On Thu, 27 Feb 2003, Mark Davis wrote:

Re: Unicode 4.0 BETA available for review

2003-02-28 Thread Yung-Fong Tang
Doug Ewell wrote: Yung-Fong Tang ftang at netscape dot com wrote: So... in the future, in order to ensure we have a good software environment, we not only need to make the Unicode 4.0 clear, but also need to speed up the revision of those RFCs. But the Unicode

Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

2003-02-28 Thread Yung-Fong Tang
Kenneth Whistler wrote: Think of it this way. Does anyone expect the ASCII standard to tell, in detail, what a process should or should not do if it receives data which purports to be ASCII, but which contains an 0x80 byte in it? All the ASCII standard can really do is tell you that 0x80 is not

Re: Unicode Arabic Rendering Problem

2003-02-28 Thread Yung-Fong Tang
My test data generator in http://people.netscape.com/ftang/testscript/arabic/arabic.html probably can also help people to look at the Arabic behavior Unfortuatelly, it is currently coded against Windows-1256 instead of the unicode.

Re: Unicode Arabic Rendering Problem

2003-02-28 Thread Yung-Fong Tang
I think you have both problem in 1 and 2 1. I think you use the wrong way to encode, you probably should encode figure 2 by using U+0644-U+0654-U+0627 and figure 3 by using U+0644-U+0627-U+0654 2. I think there are also font problem. From my test, all the font ship with MS windows does not

Re: Unicode 4.0 BETA available for review

2003-02-27 Thread Yung-Fong Tang
Stefan Persson wrote: Kenneth Whistler wrote: Unicode 3.0 defined non-shorted UTF-8 as *irregular* code value sequences. There were two types: a. 0xC0 0x80 for U+ (instead of 0x00) b. 0xED 0xA0 0x80 0xED 0xB0 0x80 for U+1 (instead of 0xF0 0x90 0x80 0x80) Ah, but encoding NULL

Re: Unicode 4.0 BETA available for review

2003-02-27 Thread Yung-Fong Tang
This discussion has been centered around UTF-8. But I hope the corresponding rules apply to UTF-16 and UTF-32 for Unicode 4.0: . for UTF-32: occurrences of 'surrogates' are ill-formed. How about UTF-32 sequence which the 4 bytes represent value U+10 ? Are they considered ill-formed?

Re: Unicode 4.0 BETA available for review

2003-02-27 Thread Yung-Fong Tang
Kent Karlsson wrote: The Unicode 4.0 text further strengthens Conformance Clause C12, to make this crystal clear: C12 When a process generates a code unit sequence which purports to be in a Unicode character encoding form, it shall not emit ill-formed code unit sequences. C12a

Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available forreview)

2003-02-27 Thread Yung-Fong Tang
Likewise, the Unicode Standard tells you what a well-formed UTF-8 byte sequence is. But it is the software designer who has to be smart about determining what his/her software will do when it encounters an error condition and finds itself dealing with a sequence which is ill-formed according to

Re: Unicode 4.0 BETA available for review

2003-02-27 Thread Yung-Fong Tang
I can keep answering these questions, but I can also assure everyone that the UTC worked *very* hard this time around to make the character encoding model much clearer in the Unicode 4.0 text, and to anticipate all these edge cases. --Ken The problem in the past come from two (or more

quoted-string in for MIME Content-Type charset parameter

2003-02-27 Thread Yung-Fong Tang
Not sure this is the right fourm to discuss this issue. I found this "problem" when I debugging a UTF-8 email message. When I look into some email that we have problem with, I just saw some Content-Type header like the following: Content-Type: text/html; charset="UTF-8" As I

Re: Unicode 4.0 BETA available for review

2003-02-26 Thread Yung-Fong Tang
Kenneth Whistler wrote: If you read through those definitions from Unicode 4.0 carefully, you will see that UTF-8 representing a noncharacter is perfectly valid, but UTF-8 representing an unpaired surrogate code point is ill-formed (and therefore disallowed). I see a hole here. How about

Re: please review the paper for me

2003-02-26 Thread Yung-Fong Tang
I think that is a very commn mistake people WILL make. Doug Ewell wrote: Thanks to all who pointed out that noncharacters, unlike surrogate code points, are NOT illegal or invalid in UTF-8 or any other CES. I don't know why I said they were. (Bad brain! Bad, bad brain!) -Doug Ewell Fullerton,

  1   2   >