RE: What to do if a legacy compatibility character is defective?

Peter Constable via Unicode Fri, 24 Oct 2025 11:48:06 -0700

TLDR…

You start by saying what your message is _NOT_ about. It would be helpful to 
have a brief abstract of what it _IS_ about so people can decide whether to 
read a long email.

Peter

From: Unicode <[email protected]> On Behalf Of 
[email protected] via Unicode
Sent: October 24, 2025 5:42 AM
To: unicode <[email protected]>
Subject: What to do if a legacy compatibility character is defective?

No, I'm not talking about U+0149, which was marked as deprecated but is in fact 
a legitimate compatibility character and is not defective as it is the only 
reasonable way to represent the byte 0xF3 in a CP853 character cell.

I am aware that this issue has already been discussed many times before on this 
mailing list, but I still did not receive a proper explanation of how exactly 
the existing characters 1FB70—1FB81 1FBB5—1FBB8 1FBBC are intended to be used 
in the context of certain legacy computing platforms. As it is, I consider 
those characters defective.

For context, in L2/25-037, I have identified a fundamental defect in how 
PETSCII, Apple II, and HP 264x characters were encoded in Unicode. The box 
drawing characters (which depend on the typeface weight) and the block elements 
(which depend on fractions of bounding box size) were unified with each other, 
which in some cases contradicted the source legacy platforms. The Unicode 13.0 
mapping table of PETSCII and Apple II characters relied on the assumption that 
the thickness of light box drawing characters is equal to 1÷8 of the width or 
height of the character. This assumption is incorrect in case of C64 version of 
PETSCII (where the thickness is 1÷4 of the width and height) and in Apple II 
(where the thickness is 1÷7 of the width and 1÷8 of the height). In case of HP 
264x, two of the characters that were unified to the same Unicode character 
were identified to have not only distinct glyphs but also distinct types of box 
drawing connections, and both characters occur within the same encoding, 
leaving the Unicode mapping incomplete.

The response to this proposal in L2/25-010 is fundamentally logically incorrect 
and does not provide any feedback whatsoever. In that response, terms like 
'differences in plain text', 'glyph distinctions', 'character identities' or 
'appropriate fonts' are thrown around as buzzwords, completely defying all 
logic. The proposal already thoroughly explains why the Unicode 13.0—17.0 
mapping is defective and why the proposed characters have a completely 
different identity from existing characters, which also makes it impossible to 
resolve with appropriate fonts.

However, what makes this especially problematic is that some of the Unicode 
characters were encoded for compatibility with legacy platforms, but the 
fundamental character identity that the characters were encoded with is not 
compatible with the original identity of the characters in the source platform.

The characters 1FB70—1FB7F, according to the L2/19-025 compatibility table 
(19025-aux-LegacyComputingSources.pdf), were encoded for compatibility with 
PETSCII, but their character identity as specified in Unicode is defined in 
terms of 1÷8 blocks. This already makes the characters incompatible with C64 
version of PETSCII. The characters also fit into the 1÷8 blocks encoded in 2581 
258F 2594—2595, but as PETSCII includes both light box drawings and fractions 
of blocks, and those characters is where the two groups of characters 
'intersect', causing the true top/bottom (but not left/right) light box 
drawings to be mapped to different values, as I already thoroughly explained in 
L2/25-037. However, the PETSCII character 0x5D is mapped to both U+2502 and 
U+1FB73, and the PETSCII character 0x40 is mapped to both U+2500 and U+1FB79. 
However, in legacy computing text modes, all of character tiles have a 1∶1 
mapping to a fixed size region of the screen, and all the tiles are independent 
from each other, so it makes no sense whatsoever to use multiple Unicode 
characters to represent the same legacy character. In the context of both 
PET/VIC20 and C64 versions of PETSCII, the characters representing horizontal 
and vertical lines match the thickness of the common light box drawing 
characters, and do not match 1÷8 blocks in C64, therefore it is inappropriate 
to identify them as a set of 1÷8 blocks. Similarly for Apple II compatibility 
characters 1FB7C 1FB80—1FB81 1FBB5—1FBB8 1FBBC, which are also defective for 
reasons I explained in L2/25-037. Some of those characters are also used in 
other platforms (across both 13.0 and 16.0), which I haven't analyzed 
thoroughly but also have similar issues.

Therefore, 1FB70—1FB81 1FBB5—1FBB8 1FBBC are defective, because their character 
identity mismatches that of the original characters on the source platforms. 
The Unicode 16.0 change of character identity of U+1FB81 does not resolve the 
issue either as it makes the third and fifth blocks unspecified but still 
enforces 1÷8 blocks on top and bottom. This also cannot be resolved by changing 
the identity of those characters to light box drawings or unspecified thickness 
because it would violate the consistency with 2581 258F 2594—2595 and disrupt 
implementations that rely on that consistency. And forget about contextual 
substitutions and other overcomplicated mechanisms, because they're completely 
irrelevant in the context of a grid of independent character tiles.

Relating to the L2/25-010 claims that this issue 'can be solved by using 
appropriate fonts', in case of PETSCII PET/VIC20, the source platform font does 
in fact match the character identities of the Unicode mapping. In case of Apple 
II, the source platform could be considered to match the character identities 
if the left and right 1÷8 blocks are rounded to 1 pixel in the width of 7 
pixels, but it makes no sense for the character identities to hinge on 
platform-specific rounding when there is already a consistent light box drawing 
thickness to work with. In case of PETSCII C64, the source platform font 
mismatches the character identities of the Unicode mapping, making it 
impossible to resolve using 'appropriate fonts'. In case of HP 264x, the source 
platform font has two different glyphs for two different character identities 
in the same encoding for the same Unicode character, which also makes it 
impossible to resolve using 'appropriate fonts'. So how is anyone ever supposed 
to use those characters in the context of PETSCII C64 or HP 264x encoding?

RE: What to do if a legacy compatibility character is defective?

Reply via email to