Re: Re:Some Unicode Characters are not displayed in my browser(IE 5.0)
At Sun, 16 Jun 2002 03:37:43 +0300, Altug B. Altintas [EMAIL PROTECTED] wrote: i shouldn't use UTF-8 or UTF-16 while String conversions, like fileid = new String(temp.getBytes(),UTF-8); This is a hack you need to perform in your Servlet code to get correct characters the user submitted in HTML forms. or like this byte[] byteArray = someString.getBytes(UTF-16); This code itself isn't wrong at all. My point was that you don't need to do all of these in your JSP code. Page directive takes care of them instead. --- Shigemichi Yazawa [EMAIL PROTECTED]
Re: Some Unicode Characters are not displayed in my browser(IE 5.0)
At Sat, 15 Jun 2002 17:13:52 +0530, Sreedhar.M [EMAIL PROTECTED] wrote: Please let me know how to display all the characters in my browser. You didn't tell us which characters are not displayed correctly, but ... fileid = new String(temp.getBytes(),UTF-8); this code is harmful. %@ page contentType=text/html; charset=UTF-8 % the above page directive instructs JSP engine to convert Java String into UTF-8. It is unnecessary to fiddle with character encodings like this in JSP, unlike in Servlet. --- Shigemichi Yazawa [EMAIL PROTECTED]
Re: Fun with UDCs in Shift-JIS
At Thu, 17 Jan 2002 13:51:07 -0500 (EST), Thomas Chan [EMAIL PROTECTED] wrote: The W3C has a page about the problems with Japanese charset identifiers and mapping tables. URL? Probably this: http://www.w3.org/TR/japanese-xml/ --- Shigemichi Yazawa [EMAIL PROTECTED]
Re: Character display problem example
At Sat, 22 Dec 2001 14:35:21 -0600 (CST), [EMAIL PROTECTED] wrote: Tomohiro KUBOTA says, in the Debian introduction to i18n: An example of Han Unification is available at U+9AA8. This is a Kanji character for 'bone'. U+8FCE is an another example of a Kanji character for 'welcome'. The part from left side to bottom side is 'run' radical. 'Run' radical is used for many Kanjis and all of them have the same problem. U+76F4 is an another example of a Kanji character for 'straight'. I, a native Japanese speaker, cannot recognize Chiense version at all. I agree with him that a Kanji 'straight' shown in Unicode book is not recognizable to even well educated Japanese people. I don't understand what's wrong with the shape of 'run' radical. Its shape is certainly different from what you would learn at elementary school, but it's well recognizable and commonly used in printing forms. --- Shigemichi Yazawa [EMAIL PROTECTED]
Re: Character display problem example
At Sat, 22 Dec 2001 14:04:32 -0500 (EST), Thomas Chan [EMAIL PROTECTED] wrote: Yes, there are simply font differences. The latter form, with the diagonal strokes arranged like / \, is the more canonical form, typically seen in printing when using the kinds of fonts that you tested with. However, the former form, with the diagonal strokes positioned like \ /, is more of a handwritten form, although you may see it in fonts that more resemble handwriting, like the brush-like kaishu(zh)/kaisho(ja) styles (which were not represented in a limited font survey). Both forms are fine in Traditional Chinese practice. PRC practice (i.e., Simplified Chinese) tends to have made even the printing forms resemble the handwritten form, although I do not doubt that a Simplified Chinese reader would accept the / \ form too. I won't presume to speak for Japanese and Koreans, but I suspect the two forms are interchangeable for them too (comments, please). Yes, they are interchangeable in Japan, too. Although the \ / form is much more commonly used in both printing and handwritten forms, there is no problem to recognize the / \ form. I would say it's just a glyph difference. --- Shigemichi Yazawa [EMAIL PROTECTED]
Re: japanese xml
At Wed, 29 Aug 2001 18:13:41 +1000, Viranga Ratnaike [EMAIL PROTECTED] wrote: I was hunting for examples of japanese xml and came across the following, which looks rather cool. Except that it doesn't seem to actually be unicode. I thought XML had mandated unicode? http://java.sun.com/xml/jaxp-1.1/examples/samples/weekly-euc-jp.xml Try http://www.fxis.co.jp/DMS/sgml/cafe/saloon/charset.zip. It's got Japanese XML documents in UTF-16, UTF-8 and ISO-2022-JP. Sun's example must be based on these Mr. Murata's documents. --- Shigemichi Yazawa [EMAIL PROTECTED]
RE: Is there Unicode mail out there?
At Thu, 19 Jul 2001 13:11:35 -0500, Ayers, Mike [EMAIL PROTECTED] wrote: I'm proposing it as a convention, not a proprietary solution. I agree that a standard solution would be preferred, especially Martin's suggestion of permitting the escape codes but not the characters. I proposed the markup as a workaround until a better solution could be found. This sounds good. Can we submit a proposition to W3C? I believe that it helps many people. - Shigemichi Yazawa [EMAIL PROTECTED]
Re: Is there Unicode mail out there?
At Thu, 19 Jul 2001 15:52:39 +0900, Martin Duerst [EMAIL PROTECTED] wrote: Of course then pattern restrictions on mixed content (which we currently don't have) would become really helpful. Martin, What kind of pattern restrictions are necessary by introducing C0 NCR? Something like this? #x1b;$B --- Shigemichi Yazawa [EMAIL PROTECTED]
Re: Is there Unicode mail out there?
At Sat, 14 Jul 2001 09:49:30 -0700, Mark Davis [EMAIL PROTECTED] wrote: No, but it is for the vast majority. Some have to be written specially, e.g. lt; I looked at XML 1.0 spec and it says in 2.4 Character Data and Markup that If they are needed elsewhere, they must be escaped using either numeric character references or the strings amp; and lt; respectively. I also looked at HTML 4.01 spec and it doesn't say in 5.3.2 Character entity references that #60; cannot be used to represent . Some cannot be written at all, e.g. U+0007 (but U+0087 can be!) This is true for XML, but I couldn't find any statement in HTML 4.01 spec to restrict the use of U+0007 in HTML document. By the way, I have been pondering why, in XML, all the C1 control characters are legal but some of the C0 control characters are not. 2.2 Characters says that Legal characters are tab, carriage return, line feed, and the legal characters of Unicode and ISO/IEC 10646. and the BNF for Char is this. [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] |/* any Unicode character, [#xE000-#xFFFD] | [#x1-#x10] excluding the surrogate blocks, FFFE, and . */ Does this mean C0 controls are not legal Unicode characters? --- Shigemichi Yazawa [EMAIL PROTECTED]
RE: UTF-16 problems
At Mon, 11 Jun 2001 15:43:42 -0700, Carl W. Brown [EMAIL PROTECTED] wrote: I first I thought the same thing but I have changed my mind. There are problems but the problems are with UTF-16 not UTF-8. I don't think your new UTF-16 propesal solves any problem. It's yet another encoding. It won't replace the existing UTF-16. The right thing to do is to sort in order of Unicode scalar value regardless of the encodings. Period. The only reason of existence on an encoding (such as UTF-8S) is to produce the same result of the binary sort with the other encoding seems so silly. - Shigemichi Yazawa [EMAIL PROTECTED]
Re: UTF8 vs AL32UTF8
At Fri, 08 Jun 2001 15:56:08 -0700, Jianping Yang [EMAIL PROTECTED] wrote: Looking at your documentation you call UTF-8s UTF8 and standard UTF-8 AL31UTF8. To me this is very misleading. We clearly documented what character set definition for UTF8 and AL32UTF8 in our manual. If you look at them you should easy map UTF8 to UTF-8S and AL32UTF8 to UTF-8. This is totally unacceptable from user's stand point. You are effectively saying that you can call an encoding any name if you describe it in your manual. Suppose Oracle decided to call Shift_JIS encoding EUCJP. A confused customer calls Oracle tech support complaining their EUC-JP data are not stored correctly. A tech support guy says, Sir, our EUCJP is not EUC-JP. It's Shift_JIS actually. It's clearly documented in our manual. And do you expect the customer says, Doh! Silly me. I should have read the manual. Thank you for your help. UTF-8S is not UTF-8. Stop calling it UTF8. - Shigemichi Yazawa [EMAIL PROTECTED]
Re: Transcriptions of Unicode
At Tue, 12 Dec 2000 10:25:59 -0800 (GMT-0800), Michael (michka) Kaplan [EMAIL PROTECTED] wrote: Ok, it happened again. I can send mail to other people and the encoding stays intact. Just the Unicode List is losing it. Does anyone have any ideas on this? I think that's because the list server strip off almost all the mail header information. The server should retain MIME-Version: Content-Type: header to allow mail clients to display the message in the right encoding. It would be even better if the server retain In-Reply-To: header so that I can view the messages in thread. --- Shigemichi Yazawa [EMAIL PROTECTED]