On 8 Apr 2012, at 18:03, Philip Jägenstedt wrote:
> On Sat, 07 Apr 2012 16:04:55 +0200, Øistein E. Andersen <[email protected]> wrote:
>
>> [1] <http://coq.no/character-tables/eten1.pdf>
>> <http://coq.no/character-tables/eten1.js>
>
> What is the source for the mappings in eten1.pdf?
Unihan H was considered normative for the 35 characters it covers:
<http://coq.no/character-tables/u-eten1.pdf>
<http://coq.no/character-tables/u-eten1.js>
The remaining Unicode mappings are mostly straightforward (given a printed
table showing the glyphs).
On 9 Apr 2012, at 02:08, Øistein E. Andersen wrote:
> On 8 Apr 2012, at 18:03, Philip Jägenstedt wrote:
>
>> On Sat, 07 Apr 2012 16:04:55 +0200, Øistein E. Andersen <[email protected]> wrote:
>>
>>> On Fri Apr 6 06:42:26 PDT 2012, Philip Jägenstedt <philipj at opera.com>
>>> wrote:
>>>
>>>> Also, a single mapping fails the Big5-contra[di]ction test:
>>>>
>>>> F9FE =>
>>>> opera-hk: U+FFED ■
>>>> firefox: U+2593 ▓
>>>> chrome: U+2593 ▓
>>>> firefox-hk: U+2593 ▓
>>>> opera: U+2593 ▓
>>>> chrome-hk: U+FFED ■
>>>> internetexplorer: U+2593 ▓
>>>> hkscs-2008: <U+FFED> ■
>>>>
>>>> I'd say that we should go with U+FFED here, since that's what the
>>>> [HKSCS-2008] spec
>>>> says and it's visually close anyway.
>>
>> [...]
>
> Lunde (if I remember correctly, 1st Edn) and Kano's 'Developing International
> Software' (1st Edn, 1995) both show something like U+2593, but it could of
> course be that popular non-Unicode (HK) Big5 fonts had glyphs more like
> U+FFED, which would make the HKSCS-2008 mapping less surprising. [...]
I was misremembering: Lunde actually shows a solid black square, so it looks
like Microsoft may have changed this in its CP950 and HKSCS-2008 restored the
original meaning. [U+FFED does not seem quite right (half-width looks
implausible), but let us not start discussing all the different black solid
squares in Unicode.]
Given the above, following HKSCS-2008 appears to be the best solution, which
brings the number of problematic forward mappings down to one.
>>> Duplicates and reverse mappings:
>>
>> [...]
>>
>> These are the ones where you (Øistein) disagree:
>>
>> [...]
>>
>>> F9E9 <= U+255E
>>> F9EA <= U+256A
>>> F9EB <= U+2561
>>> F9F9 <= U+2550
>>
>> Python's big5-hkscs agrees, but Python's big5 does this instead:
>>
>> A2A5 <= U+255E
>> A2A6 <= U+256A
>> A2A7 <= U+2561
>> A2A4 <= U+2550
>
> [...]
These are line-drawing characters with two horizontal lines
Four such characters are included in the original unextended Big5 (A2xx).
Lunde (and several Big5-based fonts on my machine) show glyphs with the two
horizontal lines quite far apart.
In contrast, the full set of line-drawing characters with double lines added by
E-Ten (F9xx) have glyphs where the two lines are quite close to each other
(both in Lunde and in contemporary fonts with a full set of such line-drawing
characters).
A potential problem of mapping U+255E to A2A5 etc. is that a non-Unicode system
will show glyphs that do not align with other line-drawing characters. A
potential problem of mapping to F9E9 etc. is that systems without support for
this E-Ten extension will show nothing at all.
'Proper' handling would probably require the four characters at A2xx to be
added to Unicode as compatibility characters or variation sequences based on
U+255E etc., but the case does not seem particularly strong unless it can be
shown that line-drawing characters with two horizontal lines relatively far
apart are somehow important.
***
Getting the double-stroked circle segments at F9FB..F9FD added to Unicode would
make it possible to provide Unicode mappings in accordance with the original
intent and remove four duplicate mappings. This might be worthwhile if the
characters have not been proposed and rejected already.
Øistein E. Andersen