Re: Indic Devanagari Query

John Cowan Wed, 29 Jan 2003 05:01:26 -0800

Keyur Shroff scripsit:

> Sentiments are attached with cultures which may vary from one geographical
> area to another. So when one of the many languages falling under the same
> script dominate the entire encoding for the script, then other group of
> people may feel that their language has not been represented properly in
> the encoding.


Indeed, they may have such beliefs, but those beliefs are based on two
incorrect notions: that what the charts show is normative, and that the
codepoint is the proper unit of processing.

> In Unicode many characters have been given codepoints regardless of the
> fact that the same character could have been rendered through some compose
> mechanism. 

In every case this was done for backward compatibility with existing
encodings.  No new codepoints of this type will be added in future.

> That is why the text should be normalized to either pre-composed or
> de-composed character sequence before going for further processing in
> operations like searching and sorting.

The collation algorithm makes allowance for these points.
It will be quite typical to tailor the algorithm to take language-specific
rules into account.

> Also, many times processing of text depends on the smallest addressable
> unit of that language. Again as discussed in earlier e-mails this may vary
> from one language to another in the same script. Consider a case when a
> language processor/application wants to count the number of characters in
> some text in order to find number of keystrokes required to input the text.

This will not work without knowledge of the keyboard layout in any case.
To enter Latin-1 characters on the Windows U.S. keyboard requires 5 keystrokes,
but they are represented by one or two Unicode characters.

-- 
Henry S. Thompson said, / "Syntactic, structural,               John Cowan
Value constraints we / Express on the fly."     [EMAIL PROTECTED]
Simon St. Laurent: "Your / Incomprehensible     http://www.reutershealth.com
Abracadabralike / schemas must die!"            http://www.ccil.org/~cowan

Re: Indic Devanagari Query

Reply via email to