On 12/4/2025 4:35 AM, [email protected] via Unicode wrote:
I have investigated the situation further and it seems that defect in
the Unicode 13.0—17.0 mapping is even more fundamental than I
previously thought. In particular, the proposal L2/25-037 does not
acknowledge the proposal L2/00-159, which had already been
incorporated into Unicode 3.2. In that proposal, the description of
characters U+23B8 (LEFT VERTICAL BOX LINE) and U+23B9 (RIGHT VERTICAL
BOX LINE) exactly matches the proposed characters L2/25-037:1FBFC (BOX
DRAWINGS LIGHT LEFT EDGE) and L2/25-037:1FBFD (BOX DRAWINGS LIGHT
RIGHT EDGE). In both proposals, those two characters are specified to
be aligned to left or right edge, span the entire edge (extending to
the top and bottom), and match the thickness of Box Drawings Light
lines. The description of the characters U+23BA (HORIZONTAL SCAN
LINE-1) and U+23BD (HORIZONTAL SCAN LINE-9) also exactly matches the
proposed characters L2/25-037:1FBFA (BOX DRAWINGS LIGHT TOP EDGE) and
L2/25-037:1FBFB (BOX DRAWINGS LIGHT BOTTOM EDGE). In both proposals,
those two characters are specified to be aligned to top and bottom
edges, span the entire edge (extending to the left and right), and
match the thickness of Box Drawings Light lines. However, the proposal
L2/00-159 had already set precedent for usage of [U+23BA, U+23BD,
U+23B8, U+23B9] (and not the 1÷8 blocks or 1÷4 blocks) in mapping to
certain platforms such as The Heath/Zenith 19 Graphics Character Set
and The DEC Special Graphics Character Set. This contrasts with the
usage of 1÷8 blocks [U+2594, U+2581, U+258F, U+2595] and other related
1÷8 or 7÷8 block characters in the mapping to PETSCII and Apple II.
Therefore there is a discrepancy between the legacy platforms added in
Unicode 3.2 (which use the box drawing lines 23B8, 23B9, 23BA, 23BD)
and the legacy platforms added in Unicode 13.0—17.0 (which use 1÷8
blocks 2594, 2581, 258F, 2595).
Dnia 25 października 2025 10:27 [email protected] via Unicode
<[email protected]> napisał(a):
Dnia 25 października 2025 08:29 Asmus Freytag via Unicode
<[email protected]> napisał(a):
Again, the identity of the Unicode character is giving by
encoding the intended mappings. If Unicode decides to map the
same character to similar characters on different platforms,
that is not a problem, as long as implementers know that the
intent is to use a platform-specific rendering (and not assume
that there is only one possible rendering per character).
If you feel that the guidance available to implementers in the
text of the standard or in an annotation of the nameslist is
not sufficent, then the remedy would be to ask for the
explanation to be updated. We are unfortunately locked in as
far as character names are concerned, but we can add a note
(best in the text of the standard) that explains that
emulators for some systems will need an adjusted design so a
sequence or other arrangement of these characters looks correct.
Indeed the character names cannot be changed due to stability
policies. An explanation note has been provided for U+1FB81 that
claims "The lines corresponding to 3 and 5 are not actually block
elements, but can show any horizontally repeating pattern", but
still implicitly enforces 1÷8 blocks for top and bottom. However,
this doesn't address other cases such as the PETSCII C64
variation. And if 1FB70—1FB81 1FBB5—1FBB8 1FBBC were all noted to
no longer require exact 1÷8 blocks, that would also not remedy the
issue because it would introduce an inconsistency with the
existing 1÷8 or 7÷8 block characters 2581 2589 258F 2594—2595,
which already have established compatibility precedents that
require the exact fraction, but are also used in the Unicode 13.0
mapping to PETSCII and Apple II character sets despite those
platforms using varying thickness (consistent with light box
drawings, except for the 1÷8 top and bottom blocks in C64, where
the 1÷4 top and bottom blocks are made consistent instead).
What is missing is an actual proposal. That is, not just analysis or
exposition, but actual proposed wording or proposed encoding that would
fix the issue.
That would need to be provided as a UTC document (aka L2 document)
submission, with the analysis appended in a background section.
A./
PS: I am not convinced that platform-specific mappings (glyphs) are an
issue, because the scenario where these data are reliably transferred
*between* legacy implementations can't have existed then, so it's
questionably why it needs to be perfect today. My assumption would be
that the use case is lossless round trip from (each) legacy emulator to
Unicode and back. Having PETSII / Apple II specific characters does not
improve things, because any data stream containing those could not be
displayed on any other emulator. This is different from legacy
characters mapped to letters and common text symbols because we have an
expectation that we can share text across devices (or emulators).