Romanized Singhala - Think about it again
Pardon me for including a CC list. These are people who showed for and against opinion. On this 4th of July, let me quote James Madison: A zeal for different opinions concerning religion, concerning government, and many other points, as well of speculation as of practice; an attachment to different leaders ambitiously contending for pre-eminence and power; or to persons of other descriptions whose fortunes have been interesting to the human passions, have, in turn, divided mankind into parties, inflamed them with mutual animosity, and rendered them much more disposed to vex and oppress each other than to co-operate for their common good. I gave much thought to why many here at the Unicode mailing list reacted badly to my saying that Unicode solution for Singhala is bad. Earlier I said the Plain Text idea is bad too. The responses came as attacks on *my* solution than in defense of Unicode Singhala. The purpose of designating naenaguru@gmail.com as a spammer is to prevent criticism. It is shameful that a standards organization belonging to corporations of repute resorts to censorship like bureaucrats and academics of little Lanka. * I ask you to reconsider:* As a way of explaining Romanized Singhala, I made some improvements to www.LovataSinhala.com http://www.lovatasinhala.com/. Mainly, it now has near the top of each page a link that says, ’switch the script’. That switches the base font of the body tag of the page between the Latin and Singhala typefaces. *Please read the smaller page that pops up.* I also verified that I hadn’t left any Unicode characters outside ISO-8859-1 in the source code -- HTML, JavaScript or CSS. The purpose of declaring the character set as iso-8859-1 than utf-8 is to avoid doubling and trebling the size of the page by utf-8. I think, if you have characters outside iso-8859-1 and declare the page as such, you get Character-not-found for those locations. (I may be wrong). Philippe Verdy, obviously has spent a lot of time researching the web site and even went as far as to check the faults of the web service provider, Godaddy.com. He called my font a hack font without any proof of it. It has only characters relevant to romanized Singhala within the SBCS. Most of the work was in the PUA and Look-up Tables. I am reminded of Inspector Clouseau that has many gadgets and in the end finds himself as the culprit. I will still read and try those other things Philippe suggests, when I get time. What is important for me is to improve on orthography rules and add more Indic languages -- Devanagari and Tamil coming up. As for those who do not want to think rationally and think Unicode is a religion, I can only point to my dilemma: http://lovatasinhala.com/assayaa.htm Have a Happy Fourth of July!
Re: Romanized Singhala - Think about it again
[removing cc list] Naena Guru wrote: On this 4th of July, let me quote James Madison: [quote from Madison irrelevant to character encoding principles snipped] I gave much thought to why many here at the Unicode mailing list reacted badly to my saying that Unicode solution for Singhala is bad. Unicode encodes Latin characters in their own block, and Sinhala characters in their own block. Many of us disagree with a solution to encode Sinhala characters as though they were merely Latin characters with different shapes, and agree with the Unicode solution to encode them as separate characters. This is a technical matter. Earlier I said the Plain Text idea is bad too. And many of us disagree with that rather vehemently as well, for many reasons. The responses came as attacks on *my* solution than in defense of Unicode Singhala. It's not personal unless you wish to make it personal. You came onto the Unicode mailing list, a place unsurprisingly filled with people who believe the Unicode model is a superior if not perfect character encoding model, and claimed that encoding Sinhala as if it were Latin (and requiring a special font to see the Sinhala glyphs) is a better model. Are you really surprised that some people here disagree with you? If you write to a Linux mailing list that Linux is terrible and Microsoft Windows is wonderful, you will see pushback there too. Here is a defense of Unicode Sinhala: it allows you, me, or anyone else to create, read, search, and sort plain text in Sinhala, optionally with any other script or combination of scripts in the same text, using any of a fairly wide variety of fonts, rendering engines, and applications. The purpose of designating naenaguru@gmail.com as a spammer is to prevent criticism. The list administrator, Sarasvati, can speak to this issue. Every mailing list, every single one, has rules concerning the conduct of posters. I note that your post made it to the list, though, so I'm not sure what you're on about. It is shameful that a standards organization belonging to corporations of repute resorts to censorship like bureaucrats and academics of little Lanka. Do not attempt to represent this as a David and Goliath battle between the big bad Unicode Consortium and poor little Sri Lanka or its citizens. This is a technical matter. I ask you to reconsider: As a way of explaining Romanized Singhala, I made some improvements to www.LovataSinhala.com. Mainly, it now has near the top of each page a link that says, ’switch the script’. That switches the base font of the body tag of the page between the Latin and Singhala typefaces. Please read the smaller page that pops up. The fundamental model is still one of representing Sinhala text using Latin characters, and relying on a font switch. It is still completely antithetical to the Unicode model. I also verified that I hadn’t left any Unicode characters outside ISO-8859-1 in the source code -- HTML, JavaScript or CSS. The purpose of declaring the character set as iso-8859-1 than utf-8 is to avoid doubling and trebling the size of the page by utf-8. I think, if you have characters outside iso-8859-1 and declare the page as such, you get Character-not-found for those locations. (I may be wrong). You didn't read what Philippe wrote. Representing Sinhala characters in UTF-8 takes *fewer* bytes, typically less than half, compared to using numeric character references like #3523;#3538;#3458;#3524;#3517; #3517;#3538;#3520;#3539;#3512;#3495; #3465;#3524;#3517;. Philippe Verdy, obviously has spent a lot of time researching the web site and even went as far as to check the faults of the web service provider, Godaddy.com. He called my font a hack font without any proof of it. A font that places glyphs for one character in the code space defined for a fundamentally different character is generally referred to as a hack (or hacked) font. A Latin-only font that placed a glyph looking like 'B' in the space reserved for 'A' would also be a hacked font. As for those who do not want to think rationally and think Unicode is a religion, I can only point to my dilemma: http://lovatasinhala.com/assayaa.htm You need to stop making this religion accusation. This is a technical matter. This is the last attempt I will make to help show YOU where the water is. -- Doug Ewell | Thornton, Colorado, USA http://www.ewellic.org | @DougEwell
Charset declaration in HTML (was: Romanized Singhala - Think about it again)
Hello Naena Guru, on 2012-07-04, you wrote: The purpose of declaring the character set as iso-8859-1 than utf-8 is to avoid doubling and trebling the size of the page by utf-8. I think, if you have characters outside iso-8859-1 and declare the page as such, you get Character-not-found for those locations. (I may be wrong). You are wrong, indeed. If you declare your page as ISO-8859-1, every octet (aka byte) in your page will be understood as a Latin-1 character; hence you cannot have any other character in your page. So, your notion of “characters outside iso-8859-1” is completely meaningless. If you declare your page as UTF-8, you can have any Unicode character (even PUA characters) in your page. Regardless of the charset declaration of your page, you can include both Numeric Character References and Character Entity References in your HTML source, cf., e.g., http://www.w3.org/TR/html401/charset.html#h-5.3. These may refer to any Unicode character, whatsoever. However, they will take considerably more storage space (and transmission bandwidth) than the UTF-8 encoded characters would take. Good luck, Otto Stolz
Re: Romanized Singhala - Think about it again
2012/7/4 Naena Guru naenag...@gmail.com: Philippe Verdy, obviously has spent a lot of time Not a lot of time... Sorry. researching the web site and even went as far as to check the faults of the web service provider, Godaddy.com. I did not even note that your hosting provider was that company. I just looked at the HTTP headers to look at the MIME type and charset declarations. Nothing else. He called my font a hack font without any proof of it. It is really a hack. Your font assigns Sinhalese characters to Latin letters (or some punctuations) of ISO 8859-1. It also assigns contextual variants of the same abstract Sinhalese letters, to ISO 8859-1 codes, plus glyphs for some ligatures of multiple Sinhalese letters to ISO 8859-1 codes, plus it reorders these glyphs so that they no longer match the Sinhalese logicial order. Yes this font is a hack because it pretends to be ISO 8859-1 when it is not. It is a specific distinct encoding which is neither ISO 859-1 and neither Unicode, but something that exists in NO existing standard. It has only characters relevant to romanized Singhala within the SBCS. Most of the work was in the PUA and Look-up Tables. I am reminded of Inspector Clouseau that has many gadgets and in the end finds himself as the culprit. And you have invented a Inspector Guru gadget for your private use on your site, instead of developping a TRUE separate encoding that you SHOULD NOT name ISO 8859-1. Try to do that, but be aware that the ISO registry of 8-bit encodings is now frozen. You'll have to convince the IANA registry to register your new encoding. For now it is registered nowhere. This is a purely local creation for your site. I will still read and try those other things Philippe suggests, when I get time. What is important for me is to improve on orthography rules and add more Indic languages -- Devanagari and Tamil coming up. As for those who do not want to think rationally and think Unicode is a religion, No. Unicode is a technical solution for a long problem : interoperability of standards using open technologies. Given that you do not want to even develop your own encoding as a registered open standard compatible with a lot of applications (remember that all new web standards MUST now support Unicode in at least one of its standard UTF, you're just loosing time here) I can only point to my dilemma: http://lovatasinhala.com/assayaa.htm Have a Happy Fourth of July! Next time don't cite me personnaly trying to conveince others that I have supported or said something I did not write myself. You have interpreted my words at your convenience, but I don't want to be associated nominatively and publicly with your personnal interpretations. Even if I also have my own opinions, I don't want to cite anyone else's opinions without just quoting his own sentences (provided that these sentences were public or that I was authorized by him to quote his sentences in other contexts). Stop this abuse of personalities. Thanks.
Re: CaseFirst and CaseLevel Tailorings of UCA and LDML
On Fri, 25 May 2012 12:34:01 -0700 Markus Scherer markus@gmail.com wrote: On Thu, May 24, 2012 at 5:36 PM, Richard Wordingham richard.wording...@ntlworld.com wrote: I spotted two differences flicking through the end of the differences - Nice work! Please submit your findings via the Unicode reporting formhttp://www.unicode.org/reporting.html . I've automated the check and have something like a 6 page list of anomalies in level 4 weights, with anomalies for DUCET and for the CLDR root locale. (Only one anomaly, affecting about 15 characters, is common to both. I have *not* listed all the characters and contractions affected by the big, clearly systematic anomalies.) The same anomalies are present in 6.2.0 drafts, which include allkeys-6.2.0d2.txt. What's the mechanism for submitting the report? Do I point to a temporary document on the web formatted according to Unicode Technical Committee rules? Richard.
Re: Romanized Singhala - Think about it again
Philippe, ask your friends why ordinary people Anglicize if Unicode Sinhala is so great. See just one of many community forums: http://elakiri.com I know you do not care about a language of a 15 milllion people, but it matters to them. On Wed, Jul 4, 2012 at 10:46 PM, Philippe Verdy verd...@wanadoo.fr wrote: You are alone to think that. Users of the Sinhalese edition of Wikipedia do not need your hack or even webfonts to use the website. It only uses standard Unicode, with very common web browsers. And it works as is. For users that are not preequiped with the necessary fonts and browsers, Wikipedia indicates this vey useful site: http://www.siyabas.lk/sinhala_how_to_install_in_english.html I have two guys here in the US that asked me to help get rid of Unicode Sinhala that I helped them install from that 'very useful site'. Copies of this message goes to them. Actually, you do not need their special installation if you have Windows 7. Windows XP needs update of Uniscribe, and Vista too. Their installation programs are faulty and interferes with your OS settings. This solves the problem at least for older version of Windows or old distributions of Linux (now all popular distributions support Sinhalese). No web fonts are even necessary (WOFT works only in Windows but not in older versions of Windows with old versions of IE). You mean WEFT? Now TTF (OTF) are compressed into WOFF. I see that Microsoft is finally supporting it.(At least my font downloads, or may be it picks up the font in my computer? Now I am confused) Everything is covered : working with TrueType and OpenType, adding an IME if needed. And then navigating on standard Sinhalese websites encoded with Unicode. Philippe, try making a web page with Unicode Sinhala. Note that for version of Windows with older versions than IE6 there is no support only because these older versions did not have the necessary minimum support for complex scripts. The alternative is to use another browser such as Firefox which uses its own independant renderer that does not depend on Windows Uniscribe support. But these users are now extremely rare. Almost everyone now uses at least XP for Windows (Windows 95/98 are definitely dead), or uses a Mac, or a smartphone, or another browser (such as Firefox, Chrome, Opera). I agree. Nobody except you support your tricks and hacks. You come really too late truing to solve a problem that no longer exists as it has been solved since long for Sinhalese. Mine is a comprehensive solution. It is a transliteration. Ask users that compared the two. Find ordinary Singhalese. They use Unicode Sinhala to read news web sites. The rest of the time they Anglicize or write in English. Everything is covered here too, buddy. Adobe apps since 2004, Apple since 2004, Mozilla since 2006, All other modern browsers since 2010. MS Office 2010. Abiword, gNumeric, Linux all the works. IE 8,9 partial. IE 10 full. So? 2012/7/5 Naena Guru naenag...@gmail.com: Hi, Philippe. Thanks for keeping engaged in the discussion. Too little time spent could lead to misunderstanding. On Wed, Jul 4, 2012 at 3:42 PM, Philippe Verdy verd...@wanadoo.fr wrote: 2012/7/4 Naena Guru naenag...@gmail.com: Philippe Verdy, obviously has spent a lot of time Not a lot of time... Sorry. researching the web site and even went as far as to check the faults of the web service provider, Godaddy.com. I did not even note that your hosting provider was that company. I just looked at the HTTP headers to look at the MIME type and charset declarations. Nothing else. I know that the browser tells it. It is not a big deal, WOFF is the compressed TTF, but TTF gets delivered. If and when GoDaddy fixes their problem, the pages get delivered faster. Or I can make that fix in a .htaccess file. No time! He called my font a hack font without any proof of it. It is really a hack. Your font assigns Sinhalese characters to Latin letters (or some punctuations) of ISO 8859-1. My font does not have anything to do with Singhalese characters if you mean Unicode characters. You are very confusing. A Character in this context is a datatype. In the 80s it was one byte in size and used to signal not to use in arithmetic. (We still did it to convert between Capitals and Simple forms.) In the Unicode character database, a character is a numerical position. A Unicode Sinhala character is defined in Hex [0D80 - 0DFF]. Unicode Sinhala characters represent an incomplete hotchpotch of ideas of letters, ligatures and signs. I have none of that in the font. I say and know that Unicode Sinhala is a failure. It inhibits use of Singhala on the computer and the network. I do not concern me with fixing it because it cannot be fixed. Only thing I did in relation to it is to write an elaborate set of routines to *translate* (not map) between constructs of Unicode Sinhala