On 03/25/2002 04:38:08 PM Kenneth Whistler wrote: >Peter Constable asked: > >> U+0027 APOSTROPHE has a general category of Po; U+02BC MODIFIER LETTER >> APOSTROPHE has a general category of Lm. I haven't checked how they >> compare with regard to any other properties. I'm wondering what kinds of >> text processes might be expected to distinguish between these (i.e. give >> different results / behaviours for the two characters). > >Well, for starters: isLetter() and isIdentifier() should give different >results. U+02BC should be part of identifiers by default -- it is part >of the alphabet of some languages. On the other hand, U+0027 is very >often a syntax character, used as a 'quote' mark to indicate delimitation >of an identifier or other symbol.
OK, both you and John mentioned identifiers. Let me ask a slightly different question: I'm thinking about all of our linquists who have existing data containing 0x27 to represent a glottal stop (some possibly also using it as a quotation mark / apostrophe), and I'm thinking about getting them migrating to using Unicode. I know that it would be good for them to encode this orthographic representation of glottal stop as U+02BC, but if they also use 0x27 for a quotation mark, it may be not so trivial to get their data converted correctly, and many might be inclined to just map 0x27 > U+0027. I'm trying to think of reasons to give them as to why they might not want to do this, and usability for identifiers isn't going to particularly grab the attention of many of them. So, why might a linguist want to go through the extra effort to map 0x27 > U+02BC in exactly those contexts when it should map to this and not U+2019 or something else? - Peter --------------------------------------------------------------------------- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]>

