At 8:39 AM -0800 6/16/00, Magda Danish (Unicode) wrote:
>Got this request by phone and email at the unicode home office.  Could
>anyone respond directly to the list and cc to [EMAIL PROTECTED]
>
>Thanks. Magda.
>
>-----Original Message-----
>From: Ken Buis [mailto:[EMAIL PROTECTED]]
>Sent: Friday, June 16, 2000 9:11 AM
>To: [EMAIL PROTECTED]
>Cc: [EMAIL PROTECTED]
>Subject: UNICODE versus Shift-JIS
>
>
>Hello,
>
>I'm currently researching the effort involved with localizing a medical
>product to Japanese. This product is display-only, no data input is
>involved. Some people are suggesting I translate the user interface
>interface using the Shift-JIS character set, others support UNICODE.

What is the platform? Java? Windows CE? Something of Agilent's? It 
makes a big difference. What fonts do you have?

>I'd
>like to know how the characters from the two sets map to each other.

See The Unicode Standard Version 3.0 (Addison-Wesley 2000) and CJKV 
Information Processing, by Ken Lunde (O'Reilly 1999) for extensive 
details. In particular, section 15.2 of the standard, Shift-JIS 
Index, pp. 923-958 begins with these code points.

SJIS  UNICODE
889F  4E9C
88A0  5516
88A1  5A03
8802  963F
...

>For
>example, would character #F123 in the Shift-JIS set be the same as
>character #F123 in the UNICODE set.

Neither standard has a character at that code point, but more 
generally, the answer is No, there is no numeric correspondence, as 
the brief quotation above shows.

>If not, are the characters stored in
>the same order in both sets, but at different offsets within each set.

No. SJIS does not have any simple ordering principle, since it 
derives from encodings that have accreted blocks from several sources 
over time. The order of the original CJK Unified block in Unicode is 
derived from several merged radical/stroke count dictionary orders, 
but other blocks will be added in future.

>For example, the Shift-JIS set starts at offset 0x000 and their
>equivalent characters start at offset 0xFF00 in the UNICODE set.

Hypothetically, you mean? Actually, the CJK Unified Ideographs block 
starts at 4E00, and the SJIS Kanji start at 889F--with different 
characters.

>If that
>is true, then character #0002 in the Shift-JIS set would be the same as
>character #FF02 in the UNICODE set.

The Unicode web site and the CD-ROM in the standard both contain 
Shift-JIS/Unicode mapping tables suitable for use in software.

>Any assistance would be greatly appreciated.
>
>Ken Buis
>Agilent Technologies
>978-659-4859


Edward Cherlin
Generalist
"A knot!" exclaimed Alice. "Oh, do let me help to undo it."
Alice in Wonderland

Reply via email to