Re: Transcoding Tamil in the presence of markup

Martin Duerst Sun, 07 Dec 2003 11:38:14 -0800

At 23:34 03/12/07 +0900, Jungshik Shin wrote:

On Sun, 7 Dec 2003, Peter Jacobi wrote:

> There is some mixup of lang and encoding tagging, which I didn't fully
> understand.

   When lang is not explicitly specified, Mozilla resorts to 'infering'
'langGroup' ('script (group)' would have been a better term) from
the page encoding. Because UTF-8 is script-neutral, it's important to
specify 'lang' explicitly. Your page is in ISO-8859-1 so that without
lang specified, it's assumed to be in 'x-western' lagnGroup(well, Latin
script). Anyway, this behavior slightly changed recently in Windows
version (I forgot when I commited that patch, before or after 1.4)
and each Unicode block is assigned the default 'script'. The way fonts
are picked up by the Xft version of Mozilla makes it harder to do the
equivalent on Linux.


I know that font selection/composition is a terribly difficult
business, and hard work, so improving things takes time.

Starting out with certain assumptions about fonts for certain
encodings is clearly very helpful for speed. But I think that
not (correctly) rendering a character that is obviously in
one script and not in another is a bad idea.

Years ago, I developed a very flexible system that was able to start out with the user-selected font but would use another font if the first font wasn't able to do the job. The basic architecture was in many ways very simple, but it took quite some time to get it right. Once I had this basic architecture, all kinds of neat things became very easy. For details, see the paper from the 7th Unicode Conference at: http://www.ifi.unizh.ch/groups/mml/people/mduerst/papers/PS/FontComposition. ps.gz

Regards, Martin.

Re: Transcoding Tamil in the presence of markup

Reply via email to