Hi, does anyone have any answers to this question?
From: [email protected]
To: [email protected]
Subject: Diacritical marks: Single character or combined character?
Date: Fri, 8 Nov 2013 18:37:29 +1100
Hi all,
I would like to know if there is a best practice or recommendation as to which
method to use when representing letters with diacritical marks. For example,
take the following two characters:
ā
ā
They may look the same, however the first is a single character U+0101, while
the second is a combination of two, the first being regular a (U+0061) and the
second being the combining macron (U+0304).
In producing content, which is the better to use? When writing in languages
such as Turkish, there are a limited finite set of diacritical marks, all of
which are represented in the Unicode character set.
However, when writing statistical formulae, every symbol used, including both
Latin and Greek characters, can have a circumflex or overline added to it to
denote a particular meaning. In that case, I found myself using the relevant
character combined with U+0302 or U+0305 as needed.
Now that I am switching between the two activities (writing stats stuff and
publishing transliterated content), I find myself unsure as to what the best
method is, if one is better than the other.
I favour using a single method for all things, and so I am attracted to the
idea of using combining characters for everything. However, language parsing
tools for languages where those combined characters are used may be fooled when
presented with U+0061 combined with U+0304 instead of the usual U+0101.
Any advice or guidance on this issue would be greatly appreciated.