On 12/26/2025 12:40 AM, [email protected] via Unicode wrote:
When I say Unicode 'supports' a character set I mean that Unicode
includes all characters in that character set.
The minimal requirement is that you can express external data (created
in that character set) in a unique way as a stream of Unicode characters
so that when you re-export the same data no distinctions have been lost.
The next requirement is that characters that are clearly analogous to
existing Unicode characters are unified. That ensures that generic
algorithms can be used to process the (content part) of such externally
generated data while they are encoded in Unicode.
This also implies that if the same content is created in different
external data sets the data can be compared and equal content gets equal
representation.
There are edge cases when characters are less part of "text content" and
more specific to a given external environment (and thus make no sense if
taken from one external environment, translated via Unicode and shown in
another such external environment). (I'm assuming all those external
sets are "legacy" environment that have ways to actually display data
that is very different from normal Unicode text documents.)
Note that a "grid of character cells" is a two-dimensional layout
of glyphs. That is different from the Unicode character-glyph
model and plain text. If a system wants to reproduce VT330/VT340
behaviour, it needs to layer a higher-level protocol on top of
Unicode plain text. So, don't expect Unicode plain text and
character-glyph models to reproduce a VT330. And, the higher-level
protocol can specify use of a font which has the glyph shapes and
alignments that fit the VT330 behaviour.
The character grid itself is different from plain text, but individual
characters within that grid still need to correspond to plain text
characters.
Yes and no.
We discussed the 3 and 5 story summation operator made from glyph
pieces. This is conceptually similar how layout systems for "regular"
text documents will use glyph pieces for large fences and integral
signs. After looking at the details it seems well motivated to extend
the existing 2 story summation operator to the 3 and 5 story versions,
by encoding additional glyph pieces. This may even be useful outside the
VT330 environment.
I think there's a strong case to encode these.
I'm less convinced in the case of box drawing characters unless someone
can provide a reasonable scenario where it matters. The distinction is
that it appears that these are not "text content" in the same way we
discussed the glyph pieces used in mathematical display equation.
If the only expected use of these character is within terminal emulation
software, then then only requirement that has to be satisfied is that
the mapping is "unique and unambiguous". There's not an equally strong
requirement to represent precise semantics because those semantics would
not ever need to be transportable.
A useful test would be whether assignment to private use characters
would be something that has an effect on dealing with these specific
characters. Private use characters are tied to specific fonts, and if
specific fonts are always tied to a given terminal emulator (if for no
other reason than to get the proportions of the "cells' correct) then
the effect of using private use characters is not observable.
(This assumes that there's no generic unicode-based processing takes
placed and the drawing is handled by the emulator, for example.)
The case for encoding these is the logical equivalent for a case made
that encoding these as private use characters actually impacts their
usability, which boils down to proving that by necessity the are being
interpreted by processes that would not be expected to be part of a
private agreement on the meaning of these private use characters.
Terminal emulators for a specific terminal, by contrast could be subject
to a private agreement without loss of generality.
So, unless the case is made that these have to be interchangeable in an
interpretable way and explains why that is, there is a difficult row to
hoe to try convince SEW that the issue warrants disunifying the
line-drawing characters.
A./