On Sun, 7 Dec 2003, Peter Jacobi wrote:
> There is some mixup of lang and encoding tagging, which I didn't fully > understand.
When lang is not explicitly specified, Mozilla resorts to 'infering' 'langGroup' ('script (group)' would have been a better term) from the page encoding. Because UTF-8 is script-neutral, it's important to specify 'lang' explicitly. Your page is in ISO-8859-1 so that without lang specified, it's assumed to be in 'x-western' lagnGroup(well, Latin script). Anyway, this behavior slightly changed recently in Windows version (I forgot when I commited that patch, before or after 1.4) and each Unicode block is assigned the default 'script'. The way fonts are picked up by the Xft version of Mozilla makes it harder to do the equivalent on Linux.
I know that font selection/composition is a terribly difficult business, and hard work, so improving things takes time.
Starting out with certain assumptions about fonts for certain encodings is clearly very helpful for speed. But I think that not (correctly) rendering a character that is obviously in one script and not in another is a bad idea.
Years ago, I developed a very flexible system that was able to
start out with the user-selected font but would use another
font if the first font wasn't able to do the job. The basic
architecture was in many ways very simple, but it took quite
some time to get it right. Once I had this basic architecture,
all kinds of neat things became very easy. For details, see
the paper from the 7th Unicode Conference at:
http://www.ifi.unizh.ch/groups/mml/people/mduerst/papers/PS/FontComposition. ps.gz
Regards, Martin.