On 10/6/2019 10:59 PM, David Starner via Unicode wrote:
I still see the encoding of the original ellipsis as a mistake,
probably for compatibility with some older standard that included it
because the system wasn't smart enough to intelligently handle "..."
as ellipsis.

Agreed, a big part was "fixed width" fonts, but the Asian variety where it may also have been baked into the layout. However, now that the code point exists, it has been integrated into the way fonts and applications handle layout.

Word, for example, appears to apply auto-correct (or does in the older version running on the machine I'm typing this on).

The point is, whatever the situation was in the late 1980's that lead to the inclusion in Unicode in the first grade isn't (can't be) the last word in defining this character: Unicode isn't merely passively modeling, but via users and implementers there's a feedback.

The practice seems to be that if you want a typographically sound ellipsis you may key in three periods, but what is stored is the code point for the ellipsis (and the layout for "random" three periods is not adjusted). In any applications that do not support that level of input support, you get a typographically not perfect representation.

That's actually not as bad as it sounds, because periods are so heavily overloaded that you'd want to be a bit careful assuming (without user override) that three of them are a true "ellipsis".

If there's no "typographically correct" form for a "comma ellipsis" then there's no difference ever between three of them and a comma ellipsis, and all further discussion is moot. However, assume there's an assertion that three commas need to be spaced differently if they are intended as a typographically correctly rendered comma ellipsis.

Asking for software to handle that on the fly (without the kind of override option provided by auto-correct or other input support mapping this to an ellipsis code point) would be wrong. One, because it assumes three commas can never be anything else than a "comma ellipsis", and two, because it would introduce a requirement that's at odds with how implementers (or at least an significant portion) have chosen to treat the 3-dot ellipsis.

There's even an argument that the whole thing is on par with input support resolving two hyphens into an en-dash and three into an m-dash, but making that subject to user override (via mapping to dedicated code points) and not simply by asserting special on-the-fly formatting.

(I also see little risk that there's a huge set of other mutliple-punctuation sequences out there that could make a legitimate claim to be encoded, so treating ellipsis as a precedent does not promise to eat up code space by the plane-load).

A./

Reply via email to