On Mon, 11 Mar 2013 05:27:35 +0100 Philippe Verdy <[email protected]> wrote:
> 2013/3/10 Richard Wordingham <[email protected]>: > > If we unify U+00B7's three possible roles of (a) digraph breaker, > > (b) ano teleia and (c) decimal point, we could have the following > > scheme: > > (1) Before digit, use decimal point glyph; > > (2) Else before letter, use digraph breaker glyph; > Note that this case 2 includes Catalan where it is more than just a > digraph breaker (between two l/L), and where it plays a role similar > to a diacritic for the letter (l/L) before it. This complicates things > a bit when the letter before it is a capital L, because it will be > typically be kerned into it (ecept possibly in cursive decorated > fonts). Are you sure that's <U+004C, U+00B7> and not U+013F? They're not canonically equivalent. I didn't see any kerning with Libreoffice's default font. > Your algorithm may be in fact part of substitution rules > implemented in fonts. That's where I would expect it to be implemented. > But as a digraph breaker in Catalan, it also plays a role in line > breakers (where the dot remains at end of line and will not be > followed by a visible hyphen. In which case there's an extra > complication : line breaks may already be part of the encoded text and > you need another case: > > (2b) Else before end of line, use digraph breaker glyphs. > Can this extra case work with Greek's use as ano teleia ? Underlying text <U+004C, U+00B7, U+2028 LINE SEPARATOR> seems to be totally ambiguous between the 'digraph breaker' and ano teleia. If they shall be rendered differently, the renderer will fail. It seems it can only be handled in a way that no compliant process can require, namely to distinguish U+0387 GREEK ANO TELEIA and U+00B7 MIDDLE DOT. It reminds me of the apparently unusable Unicode property Diacritic - U+00B7 has the property and canonically equivalent U+0387 does not. Richard.

