On Wed, 28 Mar 2012 17:40:58 +0200, Philip Jägenstedt <[email protected]> wrote:
Making big5 and big5-hkscs aliases sounds like a good idea, on the assumption that big5-hkscs is a pure extension of Big5.

I believe they are not, but given that a) Windows treats them identical and b) reportedly has no different default setup for Hong Kong and Taiwan users (and no longer offers a HKSCS download), they can probably be considered the same.

For more details on Windows and Internet Explorer, see: http://lists.w3.org/Archives/Public/www-archive/2012Mar/thread.html#msg46


To make this more concrete, here are a few fairly common characters that I think are in big5-hkscs but not in big5, their unicode point and byte representation in big5-hkscs when converted using Python:

啫 U+556B '\x94\xdc'
嗰 U+55F0 '\x9d\xf5'
嘅 U+5605 '\x9d\xef'

I'm not sure how to use big5.json, so perhaps you can tell me what these map to in various browsers? If they're all the same, examples of byte sequences that don't would be interesting.

big5.json is the result of outputting all possible lead/trail byte combinations and then running charCodeAt over the resulting string, while accounting for surrogates and working around a minor problem in Opera. Running the following (Python):

import json
data = json.loads(open("big5.json", "r").read())

lead = 0x9D
trail = 0xF5

row = 0xFE-0xA1 + 0x7E-0x40 + 2
cell = (trail-0xA1 + 0x7E-0x40 +1) if trail > 0x7E else trail - 0x40
index = (lead-0x81) * row + cell

for x in data:
    print x, hex(data[x][index])

I get

opera-hk 0x55f0
firefox 0x9c1f
chrome 0xecd7
firefox-hk 0x55f0
opera 0xfffd
chrome-hk 0x55f0
internetexplorer 0xecd7

indicating browsers agree for big5-hkscs and not at all for big5. Similar results for your other examples.


It seems fairly obvious that the most sane solution would be to just use a more correct mapping that doesn't involve the PUA, but:

1. What is the compatible subset of all browsers?
2. Does that subset include anything mapping to the PUA?

This depends on whether or not you include big5-hkscs results. Opera never maps to PUA, but whether that is compatible enough is unclear.


3. Do Hong Kong or Taiwan sites depend on charCodeAt returning values in the PUA?

4. Would hacks be needed on the font-loading side if browsers started using a more correct mapping?

Don't know.


Mozilla has done a number of interesting things here nobody else does, but that was all big in '05 or earlier.

https://bugzilla.mozilla.org/show_bug.cgi?id=9686
https://bugzilla.mozilla.org/show_bug.cgi?id=310299

How relevant that is today, given that they are not the market leader there, is unclear.


Given the information from Microsoft indicated at the start of this email I sort of think maybe just following Internet Explorer here is the best way forward, combined with strongly discouraging the usage of big5.


--
Anne van Kesteren
http://annevankesteren.nl/

Reply via email to