Hi everybody, There was recently somewhat of a controversy regarding Embedded OpenType (EOT) support in WebKit. The most important reason to support this technology is not for web designers who want custom fonts, but because some sites using legacy technology use a custom encoding with a custom embedded font to display their non-Latin characters. Most of these sites are in India or Indic languages.
I am very much not an expert in this area. My goal is to start a discussion about "what to do about Indic compatibility" rather than "should EOT be supported" in WebKit. Just supporting EOT in WebKit would make the sites appear correctly, but it would not address some of the basic problems like copy and paste or Google Chrome's full text indexing feature. Waiting for the sites to fix themselves / evangelism (basically what all browsers are doing now) is an option, and has apparently had some success. Some sites seem to be stuck on old technology, so it may not be their choice to not use Unicode. Sticking with this plan may make WebKit adoption possible in the long term, but would not help very much in the short term. Google Search does some special detection to transcode sites that use these custom encodings. One approach would be to do the same in the browser. The browser would contain a list of domains with problems, and a character map table that maps the custom 8-bit encoding to Unicode (hopefully there are many fewer encodings than sites). Alternatively, it could key off the font name, if we find that these are unique enough to identify the encoding (anybody know if this is the case?). All incoming pages would first be checked against this list, and if a match was found, it would trigger the converter. I found a list of ~100 popular sites that require special encodings that we can start with. Doing this conversion has several challenges: - It could not be blindly applied to all pages on the site. Many of the sites have English pages which we wouldn't want to convert, and if the site ever fixes itself to use a standard encoding, we would want to be able to automatically pick that up. Some pages declare the charset as "x-user-defined", while some list something else (I saw ISO-8859-1 but there may be others). I think there would need to be a somewhat smart encoding detector here (like auto charset detection today). - It could not be blindly applied to all content in a single page. Many of the pages are a combination of custom-encoded text using an EOT font and English (or other language) using a different font. For example, see http://www.futuresamachar.com/fs/hindi/index.htm ("Duration", "By Post", etc. on the right are coded to use "Verdana" to get the regular encoding and would be corrupted if a transcoder was applied to the entire page). This makes me wonder what integration with WebKit would look like, since being dependent on CSS means it couldn't be just applied in the normal character set conversion phase during parsing. Are there other approaches that WebKit-based browsers can take to getting better compatibility with Indic sites? What problems do people more familiar with this area see with the transcoding approach? Could it be implemented cleanly and would a whitelist ever have a hope of covering the sites that Indian users care about? Or should we continue with evangelism and wait? Brett _______________________________________________ webkit-dev mailing list [email protected] http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

