Re: Re:Some Unicode Characters are not displayed in my browser(IE 5.0)

2002-06-16 Thread Shigemichi Yazawa

At Sun, 16 Jun 2002 03:37:43 +0300,
Altug B. Altintas [EMAIL PROTECTED] wrote:
   i shouldn't use UTF-8 or UTF-16 while String conversions, like
 
fileid = new String(temp.getBytes(),UTF-8);

This is a hack you need to perform in your Servlet code to get correct
characters the user submitted in HTML forms.

  or like this
 
  byte[] byteArray = someString.getBytes(UTF-16);

This code itself isn't wrong at all. 

My point was that you don't need to do all of these in your JSP
code. Page directive takes care of them instead.

---
Shigemichi Yazawa
[EMAIL PROTECTED]




Re: Some Unicode Characters are not displayed in my browser(IE 5.0)

2002-06-15 Thread Shigemichi Yazawa

At Sat, 15 Jun 2002 17:13:52 +0530,
Sreedhar.M [EMAIL PROTECTED] wrote:

 Please let me know how to display all the characters in my browser.

You didn't tell us which characters are not displayed correctly, but ...

fileid = new String(temp.getBytes(),UTF-8);

this code is harmful. 

%@ page contentType=text/html; charset=UTF-8 % 

the above page directive instructs JSP engine to convert Java String
into UTF-8. It is unnecessary to fiddle with character encodings like
this in JSP, unlike in Servlet.

---
Shigemichi Yazawa
[EMAIL PROTECTED]




Re: Fun with UDCs in Shift-JIS

2002-01-17 Thread Shigemichi Yazawa

At Thu, 17 Jan 2002 13:51:07 -0500 (EST),
Thomas Chan [EMAIL PROTECTED] wrote:
  The W3C has a page about the problems with Japanese charset
  identifiers and mapping tables.
 
 URL?
 

Probably this:
http://www.w3.org/TR/japanese-xml/

---
Shigemichi Yazawa
[EMAIL PROTECTED]




Re: Character display problem example

2001-12-27 Thread Shigemichi Yazawa

At Sat, 22 Dec 2001 14:35:21 -0600 (CST),
[EMAIL PROTECTED] wrote:
 
 Tomohiro KUBOTA says, in the Debian introduction to i18n:
 
 An example of Han Unification is available at U+9AA8. This is a
 Kanji character for 'bone'. U+8FCE is an another example of a Kanji
 character for 'welcome'. The part from left side to bottom side is
 'run' radical. 'Run' radical is used for many Kanjis and all of them
 have the same problem. U+76F4 is an another example of a Kanji
 character for 'straight'. I, a native Japanese speaker, cannot
 recognize Chiense version at all.

I agree with him that a Kanji 'straight' shown in Unicode book is not
recognizable to even well educated Japanese people. I don't understand
what's wrong with the shape of 'run' radical. Its shape is certainly
different from what you would learn at elementary school, but it's
well recognizable and commonly used in printing forms.

---
Shigemichi Yazawa
[EMAIL PROTECTED]




Re: Character display problem example

2001-12-27 Thread Shigemichi Yazawa

At Sat, 22 Dec 2001 14:04:32 -0500 (EST),
Thomas Chan [EMAIL PROTECTED] wrote:
 Yes, there are simply font differences.  The latter form, with the
 diagonal strokes arranged like / \, is the more canonical form, typically
 seen in printing when using the kinds of fonts that you tested with.
 However, the former form, with the diagonal strokes positioned like \ /,
 is more of a handwritten form, although you may see it in fonts that more
 resemble handwriting, like the brush-like kaishu(zh)/kaisho(ja) styles
 (which were not represented in a limited font survey).  Both forms are
 fine in Traditional Chinese practice.  PRC practice (i.e., Simplified
 Chinese) tends to have made even the printing forms resemble the
 handwritten form, although I do not doubt that a Simplified Chinese
 reader would accept the / \ form too.  I won't presume to speak for
 Japanese and Koreans, but I suspect the two forms are interchangeable for
 them too (comments, please).

Yes, they are interchangeable in Japan, too. Although the \ / form is
much more commonly used in both printing and handwritten forms, there
is no problem to recognize the / \ form. I would say it's just a glyph
difference.

---
Shigemichi Yazawa
[EMAIL PROTECTED]




Re: japanese xml

2001-08-29 Thread Shigemichi Yazawa

At Wed, 29 Aug 2001 18:13:41 +1000,
Viranga Ratnaike [EMAIL PROTECTED] wrote:
   I was hunting for examples of japanese xml and came across the
   following, which looks rather cool.  Except that it doesn't seem
   to actually be unicode.  I thought XML had mandated unicode?
 
   http://java.sun.com/xml/jaxp-1.1/examples/samples/weekly-euc-jp.xml

Try http://www.fxis.co.jp/DMS/sgml/cafe/saloon/charset.zip.

It's got Japanese XML documents in UTF-16, UTF-8 and ISO-2022-JP.
Sun's example must be based on these Mr. Murata's documents.

---
Shigemichi Yazawa
[EMAIL PROTECTED]




RE: Is there Unicode mail out there?

2001-07-20 Thread Shigemichi Yazawa

At Thu, 19 Jul 2001 13:11:35 -0500,
Ayers, Mike [EMAIL PROTECTED] wrote:
   I'm proposing it as a convention, not a proprietary solution.  I
 agree that a standard solution would be preferred, especially Martin's
 suggestion of permitting the escape codes but not the characters.  I
 proposed the markup as a workaround until a better solution could be found.

This sounds good. Can we submit a proposition to W3C? I believe that
it helps many people.

-
Shigemichi Yazawa
[EMAIL PROTECTED]




Re: Is there Unicode mail out there?

2001-07-19 Thread Shigemichi Yazawa

At Thu, 19 Jul 2001 15:52:39 +0900,
Martin Duerst [EMAIL PROTECTED] wrote:
 Of course then pattern restrictions on mixed content (which we
 currently don't have) would become really helpful.

Martin,

What kind of pattern restrictions are necessary by introducing C0 NCR?
Something like this? #x1b;$B

---
Shigemichi Yazawa
[EMAIL PROTECTED]




Re: Is there Unicode mail out there?

2001-07-16 Thread Shigemichi Yazawa

At Sat, 14 Jul 2001 09:49:30 -0700,
Mark Davis [EMAIL PROTECTED] wrote:
 
 No, but it is for the vast majority.
 
 Some have to be written specially, e.g. lt;

I looked at XML 1.0 spec and it says in 2.4 Character Data and Markup
that

If they are needed elsewhere, they must be escaped using either
numeric character references or the strings amp; and lt;
respectively.

I also looked at HTML 4.01 spec and it doesn't say in 5.3.2 Character
entity references that #60; cannot be used to represent .

 Some cannot be written at all, e.g. U+0007 (but U+0087 can be!)

This is true for XML, but I couldn't find any statement in HTML 4.01
spec to restrict the use of U+0007 in HTML document.

By the way, I have been pondering why, in XML, all the C1 control
characters are legal but some of the C0 control characters are
not. 2.2 Characters says that Legal characters are tab, carriage
return, line feed, and the legal characters of Unicode and ISO/IEC
10646. and the BNF for Char is this.

[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] |/* any Unicode character,
 [#xE000-#xFFFD] | [#x1-#x10] excluding the surrogate blocks,
  FFFE, and . */

Does this mean C0 controls are not legal Unicode characters?

---
Shigemichi Yazawa
[EMAIL PROTECTED]




RE: UTF-16 problems

2001-06-12 Thread Shigemichi Yazawa

At Mon, 11 Jun 2001 15:43:42 -0700,
Carl W. Brown [EMAIL PROTECTED] wrote:
 I first I thought the same thing but I have changed my mind.  There are
 problems but the problems are with UTF-16 not UTF-8.

I don't think your new UTF-16 propesal solves any problem. It's yet
another encoding. It won't replace the existing UTF-16. The right
thing to do is to sort in order of Unicode scalar value regardless of
the encodings. Period. The only reason of existence on an encoding
(such as UTF-8S) is to produce the same result of the binary sort with
the other encoding seems so silly.

-
Shigemichi Yazawa
[EMAIL PROTECTED]




Re: UTF8 vs AL32UTF8

2001-06-11 Thread Shigemichi Yazawa

At Fri, 08 Jun 2001 15:56:08 -0700,
Jianping Yang [EMAIL PROTECTED] wrote:
  Looking at your documentation you call UTF-8s UTF8 and standard UTF-8
  AL31UTF8.  To me this is very misleading.
 
 
 We clearly documented what character set definition for UTF8 and AL32UTF8 in our
 manual. If you look at them you should easy map UTF8 to UTF-8S and AL32UTF8 to
 UTF-8.

This is totally unacceptable from user's stand point. You are
effectively saying that you can call an encoding any name if you
describe it in your manual.

Suppose Oracle decided to call Shift_JIS encoding EUCJP. A confused
customer calls Oracle tech support complaining their EUC-JP data are
not stored correctly. A tech support guy says, Sir, our EUCJP is not
EUC-JP. It's Shift_JIS actually. It's clearly documented in our
manual. And do you expect the customer says, Doh! Silly me. I should
have read the manual. Thank you for your help.

UTF-8S is not UTF-8. Stop calling it UTF8. 

-
Shigemichi Yazawa
[EMAIL PROTECTED]




Re: Transcriptions of Unicode

2000-12-12 Thread Shigemichi Yazawa

At Tue, 12 Dec 2000 10:25:59 -0800 (GMT-0800),
Michael (michka) Kaplan [EMAIL PROTECTED] wrote:
 Ok, it happened again. I can send mail to other people and the encoding
 stays intact. Just the Unicode List is losing it. Does anyone have any ideas
 on this?

I think that's because the list server strip off almost all the mail
header information. The server should retain

MIME-Version: 
Content-Type: 

header to allow mail clients to display the message in the right
encoding.

It would be even better if the server retain In-Reply-To: header so
that I can view the messages in thread.

---
Shigemichi Yazawa
[EMAIL PROTECTED]