----- Original Message ----- From: "Peter Kirk" <[EMAIL PROTECTED]> To: "Kenneth Whistler" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Friday, April 16, 2004 12:03 AM Subject: Re: U+0140
> On 15/04/2004 12:32, Kenneth Whistler wrote: > > >Philippe opined: > > > > > > > >>If there's something really missing for Catalan, it's a middle-dot letter with > >>general category "Lo", and combining class 0 (i.e. NOT combining). > >> > >> > > > >The one thing for sure is that the Unicode Standard does not need > >to encode more middle dots: > > > >00B7;MIDDLE DOT;Po;0;ON;;;;;N;;;;; > >0701;SYRIAC SUPRALINEAR FULL STOP;Po;0;AL;;;;;N;;;;; > >1427;CANADIAN SYLLABICS FINAL MIDDLE DOT;Lo;0;L;;;;;N;;;;; > >22C5;DOT OPERATOR;Sm;0;ON;;;;;N;;;;; > >2F02;KANGXI RADICAL DOT;So;0;ON;<compat> 4E36;;;;N;;;;; > >302E;HANGUL SINGLE DOT TONE MARK;Mn;224;NSM;;;;;N;;;;; > >30FB;KATAKANA MIDDLE DOT;Pc;0;ON;;;;;N;;;;; > >FE45;SESAME DOT;Po;0;ON;;;;;N;;;;; > >FF65;HALFWIDTH KATAKANA MIDDLE DOT;Pc;0;ON;<narrow> 30FB;;;;N;;;;; > >10101;AEGEAN WORD SEPARATOR DOT;Po;0;ON;;;;;N;;;;; > >1D16D;MUSICAL SYMBOL COMBINING AUGMENTATION DOT;Mc;226;L;;;;;N;;;;; > >2027;HYPHENATION POINT;Po;0;ON;;;;;N;;;;; > >16EB;RUNIC SINGLE PUNCTUATION;Po;0;L;;;;;N;;;;; > >1802;MONGOLIAN COMMA;Po;0;ON;;;;;N;;;;; > >318D;HANGUL LETTER ARAEA;Lo;0;L;<compat> 119E;;;;N;HANGUL LETTER ALAE A;;;; > >1D01B;BYZANTINE MUSICAL SYMBOL KENTIMA ARCHAION;So;0;L;;;;;N;;;;; > > > >(and that's not considering the lowered dots "FULL STOP" and the raised > >dots) > > > > > > > There are also, including combining middle dots (most of these listed at > U+00B7): > > U+0387 GREEK ANO TELEIA wrong form? it's a small square, and is the greek semicolon, and is then separating words. > U+05BC HEBREW POINT DAGESH OR MAPIQ where would you position it according to the Catalan L letter which has a distinct directionality, and should not inherit of the complexity of the Hebrew script? Why isn't there even U+0307 COMBINING DOT BELOW or U+0323 COMBINING DOT ABOVE in your list? > U+2022 BULLET too thick, and it is a word-breaking symbol with a candidate line break on either sides. most often is a bullet at the beginning of a sub-paragraph, but can be used for example to separate multiple titles (think about titles on CD-Audio) or dictionaries and lots of publication where it is a symbol mark which is used as a source anchor for a note. > U+2024 ONE DOT LEADER this is a spacing character, mostly a punctuation, and clearly word-breaking... > U+2219 BULLET OPERATOR this is a symbol with a evident word break on either sides (think about mathematical formulas) > U+2027 HYPHENATION POINT a good suggestion if this was not a punctuation... What is the exact status of this character? When I look into the UCD properties I see that: French name: POINT DE COUPURE DE MOT GC=Po: punctuation, other [not even a "connecting" Pc like the ASCII underscore], so a separator of words CC=0: not combining [OK] BD=ON: order neutral [OK] > What is U+2027 intended for? The name suggests that it might be what is > needed for Catalan. I think that this is better seen as an annotation used in dictionaries to note visually the position of candidate syllable breaks, (unlike the soft-hyphen which is normally not rendered except where the candidate line-break is realized). Many dictionnaries prefer a thin vertical line which extends from the descender to the ascender, and in fact there are fonts where this character is drawn like this, and which is not the same as the ASCII vertical line which is smaller and often thicker.) This notation symbol could be used in addition to and immediately after the Catalan middle-dot... My Larousse Catalan-French pocket dictionnary uses a very thin vertical line to mark word terminations and prefix/suffixes, in combination with a orthographic middle-dot in the Catalan word which is always noted. Question here: is that vertical line used in Larousse really the same as U+007C? In the same context I note that the ASCII TILDE (a large version aligned on the baseline) is used to note the common radical indicated by the vertical line symbol that separate prefixes and suffixes from the radical of the entry word... In the same dictionnary, the vertical line is also used, isolately or in a pair, and surrounded by a cadratin space, as a separator between definition items, to group them by semantic proximity; but in that case the vertical line is thicker and does not extend below the baseline, so this separator looks more like a true U+007C, i.e. a regular punctuation, with candidate line breaks occuring both before and after it (in fact at the position of the surrounding cadratin spaces)... In a Larousse French-German dictionnary, I can see the hypenation point used between a determining prefixes and the radical (for example: "ein*reisen" or "In*angriff*nahme"): this hyphenation point (noted here with a '*') is a notation symbol and is thicker. It's not even a middle-dot because it is drawn at the x-height ascent. It's not a bullet which is also used in the same Larousse dictionnaries where the bullet introduces a new grammatical semantic for the homonymic word.

