2012/6/19 Naena Guru <[email protected]>: > Unicode Sinhala: > http://ahangama.com/sing/DBS.htm (4 kB) > Romanized Singhala: > http://ahangama.com/sing/DSS.htm (1 kB) > > Compare the shape formation and the sizes of the files. How much bandwidth > is taken for the Unicode Sinhala file to go as UFT-8? 6kB!
Your stats are grossly skewed. You don't even use UTF-8 to represent Sinhalese letters, but decimal NCRs like ව in your demo page!!! That is 7 bytes per character ! Plus you have added a lot of extra indentation spaces in the DBS.html version (using decimal NCRs) that are not in your hacked DSS.htm page (which also uses a WOFF font via a CSS style, but even you server does not conform to the web standards to deliver this WOFF font: incorrect MIME types). You are then claiming that some browsers are doing things well and some others not. But the fault is your's : you don't follow the standards and browsers have different non interoperable ways to solve these non standard inconsistencies (they are not wrong if they don't render your WOFF webfonts, notably if they are not correctly identified in the HTTP protocol with the correct MIME types). Start first by auditing your demo pages and solving all warnings reported by browsers (including those that will prohibit further optimizations). Your test pages are simple enough that they should be easy to correct manually. You'll see that the conforming Unicode version (DBS.htm) can be largely improved). My browser anyway does NOT render any Sinhalese letter with your hacked DSS.htm page. But it still DOES render the Unicode version (DBS.htm) correctly even if it can be improved (remove the extra spaces and newlines like you did in DSS.htm, and REALLY encode it using UTF-8 instead of NCRs; plus make sure that CSS stylesheets gets loaded before any javascript (in both versions). Fix the MIME types on your server, and finally fix your webserver so that it obeys the HTTP/1.1 session management (so that proxies used by mobile networks can correctly use transparent data compression on both versions : UTF-8 will no longer even be a problem, as generic compressors will use less than one byte per character on typical Sinhalese texts that are consistantly encoded in the source).

