Dear Unicoders
There are some characters that have no precedent in existing encodings and are
also hard
to attest directly from printed sources. Can one still make a solid case for
encoding those in Unicode?
I am thinking of characters that are either invisible (most of the time) or can
become invisible under certain circumstances.
Precedence
----------
- HYPHEN U+2010 is *always* rendered as a hyphen (i.e. a centered horizontal
bar glyph),
which may look identical to Hyphen-Minus U+002D.
- SOFT HYPHEN (SHY) U+00AD is *only* rendered as a hyphen *when* it appears at
the end of a line.
- At least four existing math operators are *never* rendered with a visible
glyph
and only explicitly encode semantics where syntax is potentially ambiguous
otherwise:
* FUNCTION APPLICATION U+2061
is used where no multiplication is implied,
e.g. between an alphabetic function variable and an opening parenthesis:
f(x).
* INVISIBLE TIMES U+2062
is used where multiplication by either TIMES U+00D7 or MIDDLE DOT U+00B7 is
implied,
e.g. between a number and an alphabetic variable, constant or parenthesis:
2πr(a+b)
* INVISIBLE SEPARATOR U+2063
is used where enumeration by a COMMA U+002C or SEMICOLON U+003B (and
possibly whitespace) is implied,
e.g. between two single-letter variable indices: aᵢⱼ.
* INVISIBLE PLUS U+2064
is used where addition by PLUS SIGN U+002B is implied,
e.g. between an integer and a vulgar fraction: 1⅔.
Suggestions
-----------
- INVERSE SOFT HYPHEN (ISHY) or SOFT INVISIBLE HYPHEN (SIHY)
is *always* rendered as a hyphen *unless* it appears at the end of a line.
- INVISIBLE HYPHEN (IHY) or ZERO-WIDTH HYPHEN (ZWH)
is *never* rendered as a hyphen,
*but* the word it appears in is treated as if it contained one at its
position.
- INVERSE SOFT COMMA (ISC) or SOFT INVISIBLE COMMA (SIC)
is *always* rendered as a comma *unless* it appears at the end of a line.
- INVISIBLE OPEN PARENTHESIS (IOP) and INVISIBLE CLOSE PARENTHESIS (ICP)
*should not* be rendered with a visible glyph, but *may* be for inline
fallback.
ISHY/SIHY is especially useful for encoding (German) noun compounds in wrapped
titles, e.g. on product labeling, where hyphens are often suppressed for
stylistic reasons, e.g. orthographically correct _Spargelsuppe_,
_Spargel-Suppe_ (U+002D) or _Spargel‐Suppe_ (U+2010) may be rendered as
_SpargelSuppe_ and could then be encoded as _Spargel<ISHY>Suppe_.
Like the existing invisible math operators, IHY/ZWH is used where the presence
of its visible counterpart (i.e. HYPHEN) would be required syntactically (i.e.
orthographically), but can be derived from context and convention (at least by
human readers). This is useful for spell-checking, line-breaking etc., e.g. for
words (commercial names in particular) with internal capital letters that would
otherwise break orthographic rules and that should be broken at the of end a
line without a hyphen added (i.e. like ISHY/SIHY, not SHY). This is very
similar to ZERO-WIDTH SPACE (ZWSP) and WORD JOINER (WJ) indeed, except that
ZWSP separates two words, where IHY/ZWH joins them into one, but unlike WJ
still allows a line break.
ISC/SIC is particularly useful in wrapping table headers where a possible line
break can take on the separating role of a comma.
IOP and ICP enclose mathematical expressions to override precedence of
operators that would otherwise apply and they enclose textual annotation that
should be displayed outside the normal row of characters, e.g. a sum in the
numerator or denominator of a fraction and ruby/furigana pronunciation hints,
respectively, that both *may* be rendered inline where advanced typographic
functionality is unavailable and should then be parenthesized for clarity.