Re: Continue:Glaring mistake in the code list for South Asian Script

Kent Karlsson Fri, 09 Sep 2011 18:04:09 -0700

Den 2011-09-10 00:53, skrev "delex r" <[email protected]>:


> I figure out that Unicode has not addressed the sovereignty issues of a
> language

Which, I daresay, is irrelevant from a *character* encoding perspective.

> while trying to devise an ASCII like encoding system for almost all
> the characters and symbols used on earth. I am continuing with my observation
> of the glaring mistake done by Unicode by naming a South Asian Script as
> ³Bengali². Here I would like to give certain information that I think will be
> of some help for Unicode in its endeavour to faithfully represent a Universal
> Character encoding standard truer to even micro-facts.
> 
> India is believed to have at least 1652 mother tongues out of which only 22

One list of languages in India is given in
http://www.ethnologue.com/show_country.asp?name=IN
(I did not count the number of entries)

> are recognized by the Indian Constitution as official languages for
> administrative communication among local governments and to the citizens. And
> the constitution has not explicitly recognized any official script. As Unicode
> has listed the languages and scripts, the Indian Constitution has also listed

Unicode does not list any languages at all. Ok, the CLDR subproject copies a
list of language codes from the IANA language subtag registry, which (in a
complex manner) takes its language codes from (among others) the ISO 639-3
registry, which largely is in sync with Ethnologue (as in the list above);
but I guess that is not what you referred to.

> the official languages ( In its 8th schedule). The first entry in that list is
> the Assamese language.  Assamese is a sovereign language with its own grammar

Which I don't think is in dispute at all.

> and ³script² that contains some unique characters that you will not find in
> any of the scripts so far discovered by Unicode. At least 30 million people

Unicode (at this stage) does not do any "discovery". Unicode and ISO/IEC
10646 is driven by applications (proposals) to encode characters (and define
properties of characters).

> call it the ³Assamese Script² and if provided with computers and internet

If you want to disunify the Bengali script (and characters) from Assamese,
you need to show, in a proposal document, that they really are different
scripts, and should not be unified as just different uses of the same
script.

> connection can bomb the Unicode e-mail address with confirmations. These

Hmm, an email bombing threat... I'm sure Sarasvati can find a way to block
those (or we may all simply file them away as spam).

> characters are, I repeat, the one that is given a Hexcode 09F0  and the other
> with 09F1 by this universal character encoding system but unfortunat!
>  ely has described both as ³Bengali² Ra etc. etc. I don¹t know who has advised
> Unicode to use the tag ³Bengali² to name the block that includes these two
> characters. 
> 
> If you are not an Indian then just google an image of an Indian Currency note.
> There on one side of the note you will find a box inside which the value of
> the currency note is written in words in at least 15 scripts of official
> Indian languages.( I don¹t know why it is not 22). At the top , the script is
> Assamese as Assamese is the first officially recognized language (script?) .
> Next below it you will find almost similar shapes. That is in Bengali. India
> officially recognises the distinction between these two scripts which although
> shaped similar but sounds very different at many points. And the standard

Minor font differences is not a reason for disunification. Different
pronunciations of the same letters is not a reason for disunification
either. Just think of how many different ways Latin letters (and letter
combinations) are pronounced in different languages (x, j, h, v, w, f, ...;
even "a" gets different pronunciation in British English vs. US English,
and that is within the same language...; and most orthographies aren't
very accurately phonetic anyway, with quite a bit of varying (contextual
and dialectal) pronunciation for the letters).

> assamese alphabet set has extra characters which are never bengali just like
> London is never in Germany.

There are 8 London in the USA, two in Canada, one in Kiribati, ... ;-)
(http://en.wikipedia.org/wiki/London_(disambiguation))

> Coming again to the Hexcodes 09F0 (Raw) and 09F1 (wabo). Both have nothing
> Bengali in them and interestingly 09F1 ( sounds WO or WA when used within
> words) has even nothing Ra¹ sound in it. Thus you know, with actual Bengali
> alphabet set one can¹t write anything to produce the sound ³Watt² as in James
> Watt and instead need to combine three alphabets but even then only to sound
> like ³ OOYAT ³ in Bengali itself.

Yes, English has a rather peculiar pronunciation for the letter W... ;-)
Several languages will pronounce Watt (without changing the spelling) as
Vatt, and regard that as a normal pronunciation of Watt.

> Therefore Unicode must consider terming the block range as ³Assamese² which
> will faithfully describe the block range with 09F0 and 09F1 in it and replace
> all tags ³ Bengali² with ³Assamese² in the code descriptions and vice versa .
> London is in England and Berlin is in Germany. You just can¹t bring London
> into Germany and then say England is in Germany. You can¹t live with a lie or
> wrong too long.

See above re. London. ;-) As for Berlin: see
http://en.wikipedia.org/wiki/Berlin_(disambiguation)...
(I still fail to see how this would be analogous in any way whatsoever to
your quest.) 


Yes, I have responded with a quite large dose of irony. Dryer and to the
point responses by others seem to have passes unnoticed.

    /Kent K

Re: Continue:Glaring mistake in the code list for South Asian Script

Reply via email to