From: "Patrick Andries" <[EMAIL PROTECTED]> > Philippe Verdy a �crit : > >From: "Patrick Andries" <[EMAIL PROTECTED]> > >>Peter Kirk a �crit : > >>>What is U+2027 intended for? The name suggests that it might be what > >>>is needed for Catalan. > >>>[PA] Isn't this the one that should be used in dictionaries ? > >>> > >>See http://www.unicode.org/unicode/standard/reports/tr14/tr14-6.html > >>2027 > >>HYPHENATION POINT > >>Hyphenation point is primarily used to visibly indicate syllabification > >>of words. Syllable breaks are potential line breaking opportunities in > >>the middle of words. The hyphenation point It is mainly used in > >>dictionaries and similar works. When an actual line break falls inside a > >>word containing hyphenation point characters, the hyphenation point is > >>rendered as a regular hyphen at the end of the line. > > > >This last sentence is wrong, at least in my Larousse dictionnaries: > > > I believe it simply describes certain practices (Anglo-Saxon, American > ?), maybe this should be clearer.
This just demonstrate that the "only one dot character fits all" strategy is too simplist. There are atual usages in such serious publications as very common dictionnaries, of multiple dots which have their own semantics and rendering particularities. The Catalan middle-dot is a plain orthographic letter and should be treated as such, and not by borrowing a punctuation sign or symbol which may have other conflicting uses. What I suggested is that the general category, despite its weak definition, is still a good indicator of which character to use. So U+2027 (as well as the U+013F middle-dot found in ISO-8859-1/15) is not the exact character to represent this middle dot in all usages, even if there's a important legacy history of using the ISO-8859-1 middle-dot in Catalan (or a legacy use of L-middle-dot in ISO 6937 which was defined just for convenience with older technologies that could not display acceptably the sequence <L, middle-dot, L> in Catalan due to the excessive space. So a ligature was probably preferable in the Videotex context.) My opinion is that U+2027 already meant in Teletext or Videotex two abstract characters even for Catalan readers (and this can explain why there's a compatibility decomposition, as a legacy acceptable but poor fallback). The other reason is that the middle-dot, being a punctuation, would be likely to have extra spacing on both sides, which would make it inappropriate for rendering Catalan words. Also such punctuation would probably forbid kerning of the middle-dot within the open area of a uppercase L, something which would be acceptable for reading Catalan (as it was acceptable with U+2027 in Teletext/Videotex). I looked for handwritten forms of two lowercase l with an intermediate middle dot and it clearly shows that Catalan write them without extra spacing: the dot fits well within the open area between the connecting baseline and the two ascending loops (and sometimes it appears as a horizontal or slanted medial stroke that connect the two loops, or as a ligature of the two lowercase l letters, or the dot is put within the ascending loop of the first l). I don't know which form the Catalan children learn at school to write correctly the three letters, or if they are taught whever this dot is a diacritic or a special hyphen... My readings only show that there's no such L-with middle-dot in the Catalan alphabet, and it is not most often considered as a letter despite it represents a distinctive sound. An interesting article about Catalan typesetting with TeX is on: http://www.tug.org/TUGboat/Articles/tb16-3/tb48vali.pdf * It is noted that the usual middle dot (which normally appears at half the baseline and the x-height) is not exactly what is needed for catalan (where it should be placed at half the current height of the current middle-dot and the ascender height). Another feature is that the dot should be at equal distance of the two vertical stems of lowercase or uppercase L, which keep their normal distance that would be used in absence of this dot...) * So the dot is naturally kerned into the first uppercase L, but usually not between lowercase letters where it takes its space within the inter-letter spacing. * It also discusses the allowed hyphenations and their correct rendering...

