Otto Stolz wrote:

[ ... ] I'd imagine that users just type the two characters
[IJ or ij] separately, and that consequently most data in the real
world is like that.

For "IJ",
cf. <http://www.unicode.org/versions/Unicode8.0.0/ch07.pdf#G21150>.

I can't make Edge or Acrobat Reader DC jump to the bookmark (suggestions off-list, please), but I guess Otto referred to this passage, which ends with the point I was trying to make:

Another pair of characters, U+0133 LATIN SMALL LIGATURE IJ and its
uppercase version, was provided to support the digraph "ij" in Dutch,
often termed a "ligature" in discussions of Dutch orthography. When
adding intercharacter spacing for line justification, the "ij" is kept
as a unit, and the space between the i and j does not increase. In
titlecasing, both the i and the j are uppercased, as in the word
"IJsselmeer." Using a single code point might simplify software
support for such features; however, because a vast amount of Dutch
data is encoded without this digraph character, under most
circumstances one will encounter an <i, j> sequence.

--
Doug Ewell | http://ewellic.org | Thornton, CO 🇺🇸

Reply via email to