On 4/Jun/2015 19:01, Leo Broukhis wrote: > > Along the same lines, we might need a MODIFIER LETTER HYPHEN, because, for > example, the work ack-ack isn't decomposable into words, or even > morphemes, > "ack" and "ack". > I do think that U+2010 (HYPHEN) is miscategorised. I think it should have General Category = Pc, not Pd. (That is, hyphens are connectors, not dashes.) That would make it a "word" character.
Or, at the very least, U+2010 should have Word Break = MidNumLet (meaning it can occur in the middle of numbers or letters). UAX #29 says that U+2010 deliberately does *not* have Word Break = MidNumLet, though an implementation may treat it as if it did. (UAX #29 doesn't give any reasons for this decision. I can understand why U+002D (HYPHEN-MINUS) doesn't have Word Break = MidNumLet, due to its history of being used as a dash or minus sign, but U+2010 should never be used as a dash or minus sign, so I don't see the problem.) But luckily, the miscategorisation of U+2010 hasn't led to any pressing practical problems, unlike the misuse of U+2019 for the apostrophe. - Ted