Apostrophe, hyphen, and various other puncutation by default continue a word, but this behavior may be overriden on a per-language basis. Heuristics or more sophisticated engines may be needed when the apostrophe is at the end of a word, as in <the peoples' choice>, since it is ambiguous. The modifier letter apostrophe, on the other hand, is always treated as a letter.
This is as good a point as any to point people's attention to a new proposed draft TR, Text Boundaries, at http://www.unicode.org/reports/tr29/. This is in the initial (proposed draft) stage, so there is opportunity for feedback on it. Note: the grapheme cluster update was moved here from U3.2 to allow more time for feedback and tuning. Mark ————— Γνῶθι σαυτόν — Θαλῆς [For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr] http://www.macchiato.com ----- Original Message ----- From: "Marco Cimarosti" <[EMAIL PROTECTED]> To: "'Kenneth Whistler'" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Tuesday, March 26, 2002 02:24 Subject: RE: apostrophe vs. modifier letter apostrophe > Kenneth Whistler wrote: > > [...] > > This is just the computer-age version of the age-old question as > > to why a linguist would want to distinguish anything that functions > > differently. > > > > For years back in the late 70's and early 80's, before I got my > > first PC, I typed up index slips with a manual typewriter. That > > manual typewriter had various custom keys welded on, so that I could > > get schwas, open-o's, lambda's, dead-key commas above, and the like. > > [...] > > I stop quoting here because I already collected enough instances of <'s> for > making my point... > > It seems to me that a word such as "lambda's" is just an English plural noun > (also spelled "lambdas"), so it should be allowed in identifiers, it should > count as a unit for word selections, etc. > > Clearly, U+0027 (APOSTROPHE, general category "Po" = other punctuation) is > not fit for this purpose, because it has the wrong category and because it > is ambiguously used as a quotation mark. > > But neither U+2019 (RIGHT SINGLE *QUOTATION* MARK, general category "Pf" = > final *punctuation*) seems fit for the purpose. > > So, why does the Unicode book suggest U+2019 as the preferred character for > apostrophe? Wouldn't U+02BC (MODIFIER LETTER APOSTROPHE, general category > "Lm" = modifier letter) be a better choice? > > _ Marco > >

