On 28 Jun 2008, at 9:02 am, Taco Hoekwater wrote:
Arthur Reutenauer wrote:
I've been thinking: Perhaps the final solution
is to
do away with \lccode and \uccode completely and instead base the
system on unicode properties?
You don't say :-)
Well, there is a downside also: an interface to the unicode properties
would have to be written too, lest we loose flexibility. TeX users
are used to being able to modify everything, so a static database
won't do.
Operations such as case-folding must allow "tailoring" because the
properties in the UCD are defaults, not necessarily correct for every
language. (Consider the casing behavior of i in Turkish, to take one
well-known example.)
And we mustn't forget that users may need to provide properties for
PUA codepoints they're using, even if they don't normally need to
modify standard Unicode properties.
The irony here is that LuaTeX doesn't complain about duplicate
patterns anymore since the hyphenation-handling code moved over to
libHnj last October, and part 43 of the original TeX code disappeared
entirely; Taco, can you comment about that?
I could have added such testing code, but it seemed a bit pointless.
Duplicate patterns are harmless after all, it just wastes a few CPU
cycles.
There are two slightly different cases, and they might merit
different handling. Truly duplicated patterns
a1b
a1b
could be silently ignored as harmless, or perhaps a warning logged;
on the other hand, patterns that have the same sequence of letters
but different hyphenation weights
a1b
a2b
should probably be reported as "conflicting" rather than "duplicate".
(TeX does not currently distinguish between these two situations, it
just gives the "duplicate" error for both.)
JK