Hi Andy and Makoto,

I've made the change that was requested.  It would be great if someone
could test it--I don't have any genuine MS932-encoded files here...

Thanks,
Neil
Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 969-3519
E-mail:  [EMAIL PROTECTED]




                                                                                
                    Andy Clark                                                  
                    <andyc@apache.       To:     [EMAIL PROTECTED]    
                    org>                 cc:     [EMAIL PROTECTED]          
                                         Subject:     Fwd: Supporting encoding  
                    03/25/2002            ="Windows-31J"                        
                    02:23 AM                                                    
                    Please respond                                              
                    to                                                          
                    xerces-j-dev                                                
                                                                                
                                                                                



I'm forwarding this message for Makoto MURATA regarding an I18N
encoding name that is not in the Xerces2 EncodingMap. Would
someone be able to go to the official IANA site (link is in
the original message below) and update the EncodingMap to include
the names and aliases that we're missing?

-AndyC

-------- Original Message --------
Subject: Supporting encoding="Windows-31J"
Date: Mon, 25 Mar 2002 14:38:08 +0900 (JST)
From: MURATA Makoto <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]

Andy,

Could you forward this mail to the developer mailing
list of Xerces-J?  Thanks in advance.

Cheers,

Makoto
------------------------------------------------------------
Supporting encoding="Windows-31J"

MURATA Makoto

XML uses Unicode as the coded character set, and converts all legacy
encodings to Unicode.  However, different implementations use
different conversion tables unfortunately.  In particular, the
conversion table for the charset "Shift_JIS" used by Microsoft is
unique.

Java provides two encodings for so-called shift-jis.  They are "SJIS"
and "MS932".  "MS932" refers to the Microsoft conversion table and is
very specific to Windows.  Meanwhile, "SJIS" is neutral and is much
closer to the conversion table published by Unicode Consortium and
JIS.  More about this issue, see "XML Japanese Profile" published
as JIS TR and then W3C TR.

           http://www.w3.org/TR/japanese-xml/

Xerces have always used the encoding "SJIS" of Java for XML documents
containing encoding="shift_jis".  Microsoft has always used "MS932".
Since "MS932" is specific to Windows, use of "SJIS" for Xerces makes
a lot of sense.

Ideally, Microsoft should update their conversion table, but this is
extremely unlikely.  But we can hope that they use a different charset
name "Windows-31J", which is already registered at IANA, rather than
"Shift_JIS".  That is, "Shift_JIS" always refers to "SJIS" and
"windows-31j" always refers to "MS932".

           http://www.iana.org/assignments/character-sets

At present, Xerces does not support encoding="windows-31j".  By
supporting "windows-31j", we can encourage people to use
"windows-31j" when they really want to use "MS932".

Cheers,

Makoto

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to