On Saturday, April 17, 2004 10:28 PM TU+1, AntÃnio Martins-TuvÃlkin wrote: >> As I wrote earlier, if you know the text under inspection is >> Catalan, a very simple regular expression will deal with that. Any >> half-decent Catalan word processor do it already, by the way. > > What about the odd Catalan phrase within a text in Guarani or > Cherokee?
Then, you do not know the text under inspection is Catalan, the "if" is not asserted, so you are not supposed to act accordingly. That is, nobody will beg you because a double click on colÂlegi does not select the whole word; and any reader can test its own word processor, please double click the Catalan word before, and test if it is recognized as such, even if surrounded by bad English instead of Guarani! > Unicode, do not forget, supposedly brings correctness to > multilingual text... And then? Would you try to say that selecting word in multilingual text should always do the "right thing"? You were merely dreaming, I believe; and you know it perfectly; having posting less than 2 minutes ago the case of apostrophes, which is about impossible to sort out in the average multilingual text. Furthermore, what is "the right thing" varies from people to people, so achieving perfection here is a mere dream. Or are you trying to make the point that inventing a new point for  in Catalan would bring any added correctness to multilingual texts? It is certain that the compatibility encoding of U+0140 is not very welcome from my eyes, since: - it is almost unused, but for the case it might be, informaticians like me do have to check for it: so it is just a waste of my time, I would say :-( - one that reads TUS and does not know Spanish uses at the respect, might think that colÂlegi should be written coÅlegi, "co\u0140legi", because the former is not listed as a letter, and only the latter references itself as "Catalan", without mentionning the "right thing to do" - the only advantage I am able to see, namely that the typographers will design the mid dot raised in U+0140 relative to the position it has in U+00B7, is not exploited in practice; we even see a lot of fonts where the dot in U+0140 is not balanced between the l, which clearly show that the majority of typographers have no idea about the use of this character, and they probably merely build it a compound of U+006C and U+00B7... Others use a reduced size for the dot in U+0140 (which is unpleasing to my eyes). Only a few fonts do provide U+0140 with a reduced width for the dot, which might be considered good typography. Further note about typography: I have compared on some (widely available) fonts the layout of Ål versus lÂl and also the upper dot of the colon. I found that almost nobody use the upper dot of the colon. One of the few I found, namely Linotype Palatino (I cite it since I generally consider it a nice design), does use the upper dot of the colon for Å. And the result is really ugly, because the dot is way too high (about 65% of l-height), thanks to the modern habbit of the higher x-heights... Antoine

