On 28 Jun 2008, at 9:02 am, Taco Hoekwater wrote:

Arthur Reutenauer wrote:
I've been thinking: Perhaps the final solution is to
do away with \lccode and \uccode completely and instead base the
system on unicode properties?
  You don't say :-)

Well, there is a downside also: an interface to the unicode properties
would have to be written too, lest we loose flexibility. TeX users
are used to being able to modify everything, so a static database
won't do.

Operations such as case-folding must allow "tailoring" because the properties in the UCD are defaults, not necessarily correct for every language. (Consider the casing behavior of i in Turkish, to take one well-known example.)

And we mustn't forget that users may need to provide properties for PUA codepoints they're using, even if they don't normally need to modify standard Unicode properties.



  The irony here is that LuaTeX doesn't complain about duplicate
patterns anymore since the hyphenation-handling code moved over to
libHnj last October, and part 43 of the original TeX code disappeared
entirely; Taco, can you comment about that?

I could have added such testing code, but it seemed a bit pointless.
Duplicate patterns are harmless after all, it just wastes a few CPU
cycles.

There are two slightly different cases, and they might merit different handling. Truly duplicated patterns

  a1b
  a1b

could be silently ignored as harmless, or perhaps a warning logged; on the other hand, patterns that have the same sequence of letters but different hyphenation weights

  a1b
  a2b

should probably be reported as "conflicting" rather than "duplicate". (TeX does not currently distinguish between these two situations, it just gives the "duplicate" error for both.)

JK

Reply via email to