On Wed, 28 Mar 2012 17:40:58 +0200, Philip Jägenstedt <[email protected]>
wrote:
Making big5 and big5-hkscs aliases sounds like a good idea, on the
assumption that big5-hkscs is a pure extension of Big5.
I believe they are not, but given that a) Windows treats them identical
and b) reportedly has no different default setup for Hong Kong and Taiwan
users (and no longer offers a HKSCS download), they can probably be
considered the same.
For more details on Windows and Internet Explorer, see:
http://lists.w3.org/Archives/Public/www-archive/2012Mar/thread.html#msg46
To make this more concrete, here are a few fairly common characters that
I think are in big5-hkscs but not in big5, their unicode point and byte
representation in big5-hkscs when converted using Python:
啫 U+556B '\x94\xdc'
嗰 U+55F0 '\x9d\xf5'
嘅 U+5605 '\x9d\xef'
I'm not sure how to use big5.json, so perhaps you can tell me what these
map to in various browsers? If they're all the same, examples of byte
sequences that don't would be interesting.
big5.json is the result of outputting all possible lead/trail byte
combinations and then running charCodeAt over the resulting string, while
accounting for surrogates and working around a minor problem in Opera.
Running the following (Python):
import json
data = json.loads(open("big5.json", "r").read())
lead = 0x9D
trail = 0xF5
row = 0xFE-0xA1 + 0x7E-0x40 + 2
cell = (trail-0xA1 + 0x7E-0x40 +1) if trail > 0x7E else trail - 0x40
index = (lead-0x81) * row + cell
for x in data:
print x, hex(data[x][index])
I get
opera-hk 0x55f0
firefox 0x9c1f
chrome 0xecd7
firefox-hk 0x55f0
opera 0xfffd
chrome-hk 0x55f0
internetexplorer 0xecd7
indicating browsers agree for big5-hkscs and not at all for big5. Similar
results for your other examples.
It seems fairly obvious that the most sane solution would be to just use
a more correct mapping that doesn't involve the PUA, but:
1. What is the compatible subset of all browsers?
2. Does that subset include anything mapping to the PUA?
This depends on whether or not you include big5-hkscs results. Opera never
maps to PUA, but whether that is compatible enough is unclear.
3. Do Hong Kong or Taiwan sites depend on charCodeAt returning values in
the PUA?
4. Would hacks be needed on the font-loading side if browsers started
using a more correct mapping?
Don't know.
Mozilla has done a number of interesting things here nobody else does, but
that was all big in '05 or earlier.
https://bugzilla.mozilla.org/show_bug.cgi?id=9686
https://bugzilla.mozilla.org/show_bug.cgi?id=310299
How relevant that is today, given that they are not the market leader
there, is unclear.
Given the information from Microsoft indicated at the start of this email
I sort of think maybe just following Internet Explorer here is the best
way forward, combined with strongly discouraging the usage of big5.
--
Anne van Kesteren
http://annevankesteren.nl/