Keld responded: > On Fri, Aug 09, 2002 at 11:44:40PM +0100, Anto'nio Martins-Tuva'lkin wrote: > > > > Hm. But middle dot is not also a letter symbol. It's also used as a > > bullet, a tab filling, even a box-drawing char. Shouldn't Unicode > > provide a way to separate this duality? > > � has traditionally been used eg in word processors to visually display > a blank character. But it was originally intended in ISO 8859-1 and > other places for the Catalan language, which uses it in words such > ac paral�lel.
However, one cannot ignore the rest of the manifest history of this character. It also has long occurred in Code Page 437 and myriad other IBM and Microsoft Code Pages (IBM GCGID SD630000) with a long history of ambiguous usage as punctuation and many other things. > I think � is now listed in Unicode as a separator, and not > as alphabetical. It is actually listed with General Category Po (Punctuation, Other), and not as one of the separator classes. But it also has the diacritic property and the extender property, which most punctuation characters do not. Property-based implementations can take advantage of other properties of U+00B7 to distinguish it from most punctuation. > I think that is an error. How can we correct it? Changing it out of the General Category Po would disturb what by now is already a long legacy practice for many implementations. It would cause way more problems than the putative problem it is supposed to fix for Catalan. (This despite the fact that unlike the Catalan usage, which actually is more reminiscent of the delimiter usage of a middle dot, as in dictionary syl�la�bi�fi�ca�tion, there are actually quite a number of technically-based orthographies, in the Americas, at least, which use a middle dot simply as a long vowel diacritic.) Word delimitation depends on more than merely the General Category value, anyway, so appropriate word boundary determination can be developed for Catalan and other languages regardless of the General Category Po value for U+00B7. (See DUTR #29 on this.) And for identifiers, it is up to particular implementations to determine whether inclusion or exclusion of U+00B7 makes sense for their identifier syntax. What is gained for Catalan by including U+00B7 in identifiers may be offset by confusion that can set in against the usage of U+00B7 as a delimiter punctuation, or as a representation of middle dot operators in mathematical expressions. --Ken > > Kind regards > Keld > >

