Re: [tex-hyphen] Apostrophe

Jonathan Kew Mon, 16 Jun 2008 08:51:52 -0700

On 16 Jun 2008, at 4:16 pm, Mojca Miklavec wrote:

IMO, where some patterns have traditionally included theapostrophe (x27),
we should probably provide duplicate patterns with U+2019 as well.


Any little/tiny chance to use some other way to achieve the same? It's
seem like yet-another-hack to me, that will prevent us from direct
conversion to 8-bit patterns.

1.) create a list of equivalent characters

2.)
a) parse contents of \patterns and if some character from the list
belongs to that list, duplicate the pattern before it's passed to TeX

It ought to be possible to do this, I guess, but it's fairly painfulas TeX macro programming. (For LuaTeX it could no doubt be done muchmore easily in Lua, but that doesn't help XeTeX.)

b) extend the engine (only XeTeX/LuaTeX in that case) in some way to
accept hints that some characters are equivalent during hyphenation. I
guess that \lccode does exactly that, but I'm not sure what will
happen if I set lccode of "adiaeresis" to lccode of "a" for example,
when I want to use some macro to do uppercasing/lowercasing of words
for me.

Or to take the specific example of the apostrophe, we could set\lccode"2019="27 (or vice versa, depending which way we want to writethe patterns). But then if someone applies \lowercase to a run oftext that includes the ’ character, they'll be surprised to see itchanged to '.

The trouble is that \lccode is overloaded, being used for multiplepurposes that may not always want the same set of mappings. I supposeif we had a separate \hyphequiv table, that would help -- but you'renot getting a new feature like that in time for the TL2008 release!

I would really prefer not to introduce new hacks in patterns.
Apostrophe represents a single character, so it should be left as a
single character in patterns (assuming that we leave it there), only
TeX might see it in a different way.

The correct Unicode character to use would be U+2019, I think, so wecould simply use that in the patterns and ignore U+0027. The troubleis that there are sure to be users who have U+0027 in their text, andexpect this to behave the same way; in order to support both the"best practice" and the "ASCII-like" encoding of the data, we needtwo versions of the patterns. That's not really a "hack in patterns",IMO, it's a concession to the fact that real-life data will notalways be encoded in the purest and best Unicode Way, and it may behelpful to try and support these "variant spellings" where possible.

JK

Re: [tex-hyphen] Apostrophe

Reply via email to