Romanized Singhala - Think about it again

2012-07-04 Thread Naena Guru
Pardon me for including a CC list. These are people who showed for and
against opinion.

On this 4th of July, let me quote James Madison:
A zeal for different opinions concerning religion, concerning government,
and many other points, as well of speculation as of practice; an attachment
to different leaders ambitiously contending for pre-eminence and power; or
to persons of other descriptions whose fortunes have been interesting to
the human passions, have, in turn, divided mankind into parties, inflamed
them with mutual animosity, and rendered them much more disposed to vex and
oppress each other than to co-operate for their common good.

I gave much thought to why many here at the Unicode mailing list reacted
badly to my saying that Unicode solution for Singhala is bad. Earlier I
said the Plain Text idea is bad too. The responses came as attacks on *my*
solution than in defense of Unicode Singhala. The purpose of designating
naenaguru@‌‌gmail.com as a spammer is to prevent criticism. It is shameful
that a standards organization belonging to corporations of repute resorts
to censorship like bureaucrats and academics of little Lanka.
*
I ask you to reconsider:*
As a way of explaining Romanized Singhala, I made some improvements to
www.LovataSinhala.com http://www.lovatasinhala.com/. Mainly, it now has
near the top of each page a link that says, ’switch the script’. That
switches the base font of the body tag of the page between the Latin and
Singhala typefaces. *Please read the smaller page that pops up.*

I also verified that I hadn’t left any Unicode characters outside
ISO-8859-1 in the source code -- HTML, JavaScript or CSS. The purpose of
declaring the character set as iso-8859-1 than utf-8 is to avoid doubling
and trebling the size of the page by utf-8. I think, if you have characters
outside iso-8859-1 and declare the page as such, you get
Character-not-found for those locations. (I may be wrong).

Philippe Verdy, obviously has spent a lot of time researching the web site
and even went as far as to check the faults of the web service provider,
Godaddy.com. He called my font a hack font without any proof of it. It has
only characters relevant to romanized Singhala within the SBCS. Most of the
work was in the PUA and Look-up Tables. I am reminded of Inspector Clouseau
that has many gadgets and in the end finds himself as the culprit.

I will still read and try those other things Philippe suggests, when I get
time. What is important for me is to improve on orthography rules and add
more Indic languages -- Devanagari and Tamil coming up.

As for those who do not want to think rationally and think Unicode is a
religion, I can only point to my dilemma:
http://lovatasinhala.com/assayaa.htm

Have a Happy Fourth of July!


Re: Romanized Singhala - Think about it again

2012-07-04 Thread Doug Ewell

[removing cc list]

Naena Guru wrote:


On this 4th of July, let me quote James Madison:


[quote from Madison irrelevant to character encoding principles snipped]

I gave much thought to why many here at the Unicode mailing list 
reacted badly to my saying that Unicode solution for Singhala is bad.


Unicode encodes Latin characters in their own block, and Sinhala 
characters in their own block. Many of us disagree with a solution to 
encode Sinhala characters as though they were merely Latin characters 
with different shapes, and agree with the Unicode solution to encode 
them as separate characters. This is a technical matter.



Earlier I said the Plain Text idea is bad too.


And many of us disagree with that rather vehemently as well, for many 
reasons.


The responses came as attacks on *my* solution than in defense of 
Unicode Singhala.


It's not personal unless you wish to make it personal. You came onto the 
Unicode mailing list, a place unsurprisingly filled with people who 
believe the Unicode model is a superior if not perfect character 
encoding model, and claimed that encoding Sinhala as if it were Latin 
(and requiring a special font to see the Sinhala glyphs) is a better 
model. Are you really surprised that some people here disagree with you? 
If you write to a Linux mailing list that Linux is terrible and 
Microsoft Windows is wonderful, you will see pushback there too.


Here is a defense of Unicode Sinhala: it allows you, me, or anyone else 
to create, read, search, and sort plain text in Sinhala, optionally with 
any other script or combination of scripts in the same text, using any 
of a fairly wide variety of fonts, rendering engines, and applications.


The purpose of designating naenaguru@‌‌gmail.com as a spammer is to 
prevent criticism.


The list administrator, Sarasvati, can speak to this issue. Every 
mailing list, every single one, has rules concerning the conduct of 
posters. I note that your post made it to the list, though, so I'm not 
sure what you're on about.


It is shameful that a standards organization belonging to corporations 
of repute resorts to censorship like bureaucrats and academics of 
little Lanka.


Do not attempt to represent this as a David and Goliath battle between 
the big bad Unicode Consortium and poor little Sri Lanka or its 
citizens. This is a technical matter.



I ask you to reconsider:
As a way of explaining Romanized Singhala, I made some improvements to 
www.LovataSinhala.com. Mainly, it now has near the top of each page a 
link that says, ’switch the script’. That switches the base font of 
the body tag of the page between the Latin and Singhala typefaces. 
Please read the smaller page that pops up.


The fundamental model is still one of representing Sinhala text using 
Latin characters, and relying on a font switch. It is still completely 
antithetical to the Unicode model.


I also verified that I hadn’t left any Unicode characters outside 
ISO-8859-1 in the source code -- HTML, JavaScript or CSS. The purpose 
of declaring the character set as iso-8859-1 than utf-8 is to avoid 
doubling and trebling the size of the page by utf-8. I think, if you 
have characters outside iso-8859-1 and declare the page as such, you 
get Character-not-found for those locations. (I may be wrong).


You didn't read what Philippe wrote. Representing Sinhala characters in 
UTF-8 takes *fewer* bytes, typically less than half, compared to using 
numeric character references like #3523;#3538;#3458;#3524;#3517; 
#3517;#3538;#3520;#3539;#3512;#3495; #3465;#3524;#3517;.


Philippe Verdy, obviously has spent a lot of time researching the web 
site and even went as far as to check the faults of the web service 
provider, Godaddy.com. He called my font a hack font without any proof 
of it.


A font that places glyphs for one character in the code space defined 
for a fundamentally different character is generally referred to as a 
hack (or hacked) font. A Latin-only font that placed a glyph looking 
like 'B' in the space reserved for 'A' would also be a hacked font.


As for those who do not want to think rationally and think Unicode is 
a religion, I can only point to my dilemma:

http://lovatasinhala.com/assayaa.htm


You need to stop making this religion accusation. This is a technical 
matter.


This is the last attempt I will make to help show YOU where the water 
is.


--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell ­




Charset declaration in HTML (was: Romanized Singhala - Think about it again)

2012-07-04 Thread Otto Stolz

Hello Naena Guru,

on 2012-07-04, you wrote:

The purpose of
declaring the character set as iso-8859-1 than utf-8 is to avoid doubling
and trebling the size of the page by utf-8. I think, if you have characters
outside iso-8859-1 and declare the page as such, you get
Character-not-found for those locations. (I may be wrong).


You are wrong, indeed.

If you declare your page as ISO-8859-1, every octet
(aka byte) in your page will be understood as a Latin-1
character; hence you cannot have any other character
in your page. So, your notion of “characters outside
iso-8859-1” is completely meaningless.

If you declare your page as UTF-8, you can have
any Unicode character (even PUA characters) in
your page.

Regardless of the charset declaration of your page,
you can include both Numeric Character References
and Character Entity References in your HTML source,
cf., e.g., http://www.w3.org/TR/html401/charset.html#h-5.3.
These may refer to any Unicode character, whatsoever.
However, they will take considerably more storage space
(and transmission bandwidth) than the UTF-8 encoded
characters would take.

Good luck,
  Otto Stolz





Re: Romanized Singhala - Think about it again

2012-07-04 Thread Philippe Verdy
2012/7/4 Naena Guru naenag...@gmail.com:
 Philippe Verdy, obviously has spent a lot of time

Not a lot of time... Sorry.

 researching the web site
 and even went as far as to check the faults of the web service provider,
 Godaddy.com.

I did not even note that your hosting provider was that company. I
just looked at the HTTP headers to look at the MIME type and charset
declarations. Nothing else.

 He called my font a hack font without any proof of it.

It is really a hack. Your font assigns Sinhalese characters to Latin
letters (or some punctuations) of ISO 8859-1. It also assigns
contextual variants of the same abstract Sinhalese letters, to ISO
8859-1 codes, plus glyphs for some ligatures of multiple Sinhalese
letters to ISO 8859-1 codes, plus it reorders these glyphs so that
they no longer match the Sinhalese logicial order.

Yes this font is a hack because it pretends to be ISO 8859-1 when it
is not. It is a specific distinct encoding which is neither ISO 859-1
and neither Unicode, but something that exists in NO existing
standard.

 It has
 only characters relevant to romanized Singhala within the SBCS. Most of the
 work was in the PUA and Look-up Tables. I am reminded of Inspector Clouseau
 that has many gadgets and in the end finds himself as the culprit.

And you have invented a Inspector Guru gadget for your private use on
your site, instead of developping a TRUE separate encoding that you
SHOULD NOT name ISO 8859-1. Try to do that, but be aware that the
ISO registry of 8-bit encodings is now frozen. You'll have to convince
the IANA registry to register your new encoding. For now it is
registered nowhere. This is a purely local creation for your site.

 I will still read and try those other things Philippe suggests, when I get
 time. What is important for me is to improve on orthography rules and add
 more Indic languages -- Devanagari and Tamil coming up.

 As for those who do not want to think rationally and think Unicode is a
 religion,

No. Unicode is a technical solution for a long problem :
interoperability of standards using open technologies. Given that you
do not want to even develop your own encoding as a registered open
standard compatible with a lot of applications (remember that all new
web standards MUST now support Unicode in at least one of its standard
UTF, you're just loosing time here)

 I can only point to my dilemma:
 http://lovatasinhala.com/assayaa.htm

 Have a Happy Fourth of July!

Next time don't cite me personnaly trying to conveince others that I
have supported or said something I did not write myself. You have
interpreted my words at your convenience, but I don't want to be
associated nominatively and publicly with your personnal
interpretations. Even if I also have my own opinions, I don't want to
cite anyone else's opinions without just quoting his own sentences
(provided that these sentences were public or that I was authorized by
him to quote his sentences in other contexts).

Stop this abuse of personalities. Thanks.



Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-07-04 Thread Richard Wordingham
On Fri, 25 May 2012 12:34:01 -0700
Markus Scherer markus@gmail.com wrote:

 On Thu, May 24, 2012 at 5:36 PM, Richard Wordingham 
 richard.wording...@ntlworld.com wrote:
 
  I spotted two differences flicking through the end of the
  differences -
 
 
 Nice work! Please submit your findings via the Unicode reporting
 formhttp://www.unicode.org/reporting.html
 .

I've automated the check and have something like a 6 page list
of anomalies in level 4 weights, with anomalies for DUCET and for the
CLDR root locale. (Only one anomaly, affecting about 15 characters, is
common to both.  I have *not* listed all the characters and
contractions affected by the big, clearly systematic anomalies.) The
same anomalies are present in 6.2.0 drafts, which include
allkeys-6.2.0d2.txt. What's the mechanism for submitting the report? Do
I point to a temporary document on the web formatted according to
Unicode Technical Committee rules?

Richard.



Re: Romanized Singhala - Think about it again

2012-07-04 Thread Naena Guru
Philippe, ask your friends why ordinary people Anglicize if Unicode Sinhala
is so great. See just one of many community forums: http://elakiri.com

I know you do not care about a language of a 15 milllion people, but it
matters to them.

On Wed, Jul 4, 2012 at 10:46 PM, Philippe Verdy verd...@wanadoo.fr wrote:

 You are alone to think that. Users of the Sinhalese edition of
 Wikipedia do not need your hack or even webfonts to use the website.
 It only uses standard Unicode, with very common web browsers. And it
 works as is.
 For users that are not preequiped with the necessary fonts and
 browsers, Wikipedia indicates this vey useful site:
 http://www.siyabas.lk/sinhala_how_to_install_in_english.html

I have two guys here in the US that asked me to help get rid of Unicode
Sinhala that I helped them install from that 'very useful site'. Copies of
this message goes to them. Actually, you do not need their special
installation if you have Windows 7. Windows XP needs update of Uniscribe,
and Vista too. Their installation programs are faulty and interferes with
your OS settings.



 This solves the problem at least for older version of Windows or old
 distributions of Linux (now all popular distributions support
 Sinhalese). No web fonts are even necessary (WOFT works only in
 Windows but not in older versions of Windows with old versions of IE).

You mean WEFT? Now TTF (OTF) are compressed into WOFF. I see that Microsoft
is finally supporting it.(At least my font downloads, or may be it picks up
the font in my computer? Now I am confused)


 Everything is covered : working with TrueType and OpenType, adding an
 IME if needed. And then navigating on standard Sinhalese websites
 encoded with Unicode.


Philippe, try making a web page with Unicode Sinhala.


 Note that for version of Windows with older versions than IE6 there is
 no support only because these older versions did not have the
 necessary minimum support for complex scripts. The alternative is to
 use another browser such as Firefox which uses its own independant
 renderer that does not depend on Windows Uniscribe support. But these
 users are now extremely rare. Almost everyone now uses at least XP for
 Windows (Windows 95/98 are definitely dead), or uses a Mac, or a
 smartphone, or another browser (such as Firefox, Chrome, Opera).

I agree.


 Nobody except you support your tricks and hacks. You come really too
 late truing to solve a problem that no longer exists as it has been
 solved since long for Sinhalese.

Mine is a comprehensive solution. It is a transliteration. Ask users that
compared the two. Find ordinary Singhalese. They use Unicode Sinhala to
read news web sites. The rest of the time they Anglicize or write in
English.

Everything is covered here too, buddy. Adobe apps since 2004, Apple since
2004, Mozilla since 2006, All other modern browsers since 2010. MS Office
2010. Abiword, gNumeric, Linux all the works. IE 8,9 partial. IE 10 full.
So?


 2012/7/5 Naena Guru naenag...@gmail.com:
  Hi, Philippe. Thanks for keeping engaged in the discussion. Too little
 time
  spent could lead to misunderstanding.
 
 
  On Wed, Jul 4, 2012 at 3:42 PM, Philippe Verdy verd...@wanadoo.fr
 wrote:
 
  2012/7/4 Naena Guru naenag...@gmail.com:
   Philippe Verdy, obviously has spent a lot of time
 
  Not a lot of time... Sorry.
 
   researching the web site
   and even went as far as to check the faults of the web service
 provider,
   Godaddy.com.
 
  I did not even note that your hosting provider was that company. I
  just looked at the HTTP headers to look at the MIME type and charset
  declarations. Nothing else.
 
  I know that the browser tells it. It is not a big deal, WOFF is the
  compressed TTF, but TTF gets delivered. If and when GoDaddy fixes their
  problem, the pages get delivered faster. Or I can make that fix in a
  .htaccess file. No time!
 
 
   He called my font a hack font without any proof of it.
 
  It is really a hack. Your font assigns Sinhalese characters to Latin
  letters (or some punctuations) of ISO 8859-1.
 
  My font does not have anything to do with Singhalese characters if you
 mean
  Unicode characters. You are very confusing.
  A Character in this context is a datatype. In the 80s it was one byte in
  size and used to signal not to use in arithmetic. (We still did it to
  convert between Capitals and Simple forms.) In the Unicode character
  database, a character is a numerical position. A Unicode Sinhala
 character
  is defined in Hex [0D80 - 0DFF]. Unicode Sinhala characters represent an
  incomplete hotchpotch of ideas of letters, ligatures and signs. I have
 none
  of that in the font.
 
  I say and know that Unicode Sinhala is a failure. It inhibits use of
  Singhala on the computer and the network. I do not concern me with
 fixing it
  because it cannot be fixed. Only thing I did in relation to it is to
 write
  an elaborate set of routines to *translate* (not map) between constructs
 of
  Unicode Sinhala