Re: Are these characters encoded?

Asmus Freytag Sun, 02 Dec 2001 22:14:20 -0800

At 05:29 PM 12/1/01 -0600, David Starner wrote:
> > It is certainly not a glyph variant of an ampersand. An ampersand is
> > a ligature of e and t. This is certainly an abbreviation of och. That
> > both mean "and" is NOT a reason for unifying different signs.
>
>But the fact that they never appear in the same text in the same font,
>and that one appears in handwritten text in the same places as the
>ampersand appears in machine written text means that it is a glyph
>variant. In any case, if it never appears in machine-written text, (if
>there's no font, as you point out for proposed ConScript additions),
>then there's no need to encode it.


Signs for faithful renderings of manuscript are - at least at the moment -
somewhat outside the scope of Unicode. Having said that, an exception for
current practice can be certainly be considered, as instances of type-set
"handwriting" are not generally uncommon, even if we can't lay our hands on
them on demand. So, on this aspect of the character alone I would not like
to make a ruling one way or another, but getting a printed 'och' would
certainly make the counterargument moot.

I wish that Unicode encoding principles were as easy as "If entity A only
occurs in one context and entity B occurs only in another, they can be
unified". Well, taking this argument to extreme, we could unify a lot of
unrelated things. Unicode might have fit in 64K after all. ;-)

Michael's argument that "and" (Sw. 'och') and "et" are different words and
need to be distinguished on that score alone is interesting, because
semantics and usage are so close. For letters we have long held that if it
is the same letter, we don't disunify it across languages. Why this
necessarily breaks down for abbreviation of a near universal word as 'and',
is not necessarily clear.

However, the Swedish case is really that the handwriting uses o-underbar
*NOT* in place of the ampersand, but in places where the typeset text
presumeably would have the word 'och' spelled out. In fact, I would guess
that a handwritten text referring to a company name, for example
Rab�n&Sj�gren might use the & and not the o-underbar in Swedish. I don't
know this for sure, but I strongly suspect that such differentiation of
usage exists that would make it awkward to convert printed handwriting
into printed text by a pure font change.

Overloading the existing 00BA � is tempting, but would likely result in
incorrect output unless special purpose (read private use) fonts are used,
or unless it became common to have a Swedish glyph overrides in fonts and
rendering engines that applied them. Since the usage and typographic
convention for 'och' and the raised o for numbering are not related, this
unification smells more of shoehorning than encoding.

(BTW it's not B0 as someone noted, that's a raised digit 0).

The strongest surviving candidate is the composed sequence U+006F U+0332,
but 0332 is an underscore, and not something that sits on-line. Again,
it would take special-purpose or specifically Swedish aware fonts and/
or rendering engines that support them to get the right result. That would
argue against this particular unification - even though it would be quite
acceptable for rough plain text usage.

If the character can be shown to have as much justification for existence
as coded character as similar characters in the standard, i.e. if it's
ever used in printed handwriting, etc., etc., than we will have a tough
time coming up with a unification that's not (far) worse than just adding
it by itself.

A./

Re: Are these characters encoded?

Reply via email to