> > It's *much* easier -- and, in the long term, safer -- for them to > > select from the extensive inventory of characters available in Unicode and > > to avoid using ASCII punctuation characters with redefined word-building > > semantics. > > I don't get what you are saying here, why should people be limited to > ASCII punctuation characters?
That isn't what Peter was saying. You are confused here by your misinterpretation of what he was saying. The recommendation that Peter was making is that people devising orthographies for languages should stick to Unicode letters for the letters of their orthography. (If the script in question is Latin, as most new orthographies are, then there are *hundreds* of Latin letters to choose from in the standard.) What orthography developers should avoid is using characters like "7" "@" "!" "$", "'" and so on as letters of their orthography, since those are certain to cause all kinds of havoc with word-break and other processes for standard software -- or even lead to the kind of absurdities as people wanting illegal constructs like: 'jo'@Abr@c@d@br@.com, which locales can*not* fix. Just as choices about rational orthographies used to have to take ease of use on typewriters as a major factor involved (to fail to do so would be to condemn legions of people to wretched inefficiency) -- so choices about new rational orthographies should now being taking ease of use on computers as a major factor involved. That is just a realistic approach that any *serious* deviser of an orthography should be taking into account. > With GNU libc you can declare your own set > of punctuation characters in the locale, and they can be any 10646 > character. Peter was talking about the opposite case. But you should examine carefully what the implications are of your suggestion here. If I were to make the absurd choice of picking 18 Chinese characters to serve as my punctuation characters, and then went through the exercise of declaring my own locale with GNU libc, I would only be guaranteeing that my locale (and all my text data) would only function correctly in a microscopic environment that I defined (or could browbeat a few others to share). The reason for sticking to the Universal Character Set and for sticking to standardized properties for the characters in that set is to guarantee widespread interoperability and to guarantee that my text, in my language, works correctly in all off-the-shelf software -- not merely in my own hacked-up locale. Serious orthography designers should not allow themselves to get stuck in such dead-end traps. --Ken > Or are you referring to the specific locale syntax from > POSIX/TR 14652? > > Kind regards > Keld > >

