Keyur Shroff scripsit: > Sentiments are attached with cultures which may vary from one geographical > area to another. So when one of the many languages falling under the same > script dominate the entire encoding for the script, then other group of > people may feel that their language has not been represented properly in > the encoding.
Indeed, they may have such beliefs, but those beliefs are based on two incorrect notions: that what the charts show is normative, and that the codepoint is the proper unit of processing. > In Unicode many characters have been given codepoints regardless of the > fact that the same character could have been rendered through some compose > mechanism. In every case this was done for backward compatibility with existing encodings. No new codepoints of this type will be added in future. > That is why the text should be normalized to either pre-composed or > de-composed character sequence before going for further processing in > operations like searching and sorting. The collation algorithm makes allowance for these points. It will be quite typical to tailor the algorithm to take language-specific rules into account. > Also, many times processing of text depends on the smallest addressable > unit of that language. Again as discussed in earlier e-mails this may vary > from one language to another in the same script. Consider a case when a > language processor/application wants to count the number of characters in > some text in order to find number of keystrokes required to input the text. This will not work without knowledge of the keyboard layout in any case. To enter Latin-1 characters on the Windows U.S. keyboard requires 5 keystrokes, but they are represented by one or two Unicode characters. -- Henry S. Thompson said, / "Syntactic, structural, John Cowan Value constraints we / Express on the fly." [EMAIL PROTECTED] Simon St. Laurent: "Your / Incomprehensible http://www.reutershealth.com Abracadabralike / schemas must die!" http://www.ccil.org/~cowan

