RE: metric for block coverage

Peter Constable via Unicode Tue, 27 Feb 2018 07:40:47 -0800

You have clarified what exactly the usage is; you've only asked what it means 
to cover a script.


James Kass mentioned a font's OS/2 table. That is obsolete: as Khaled pointed 
out, there has never been a clear definition of "supported" and practice has 
been inconsistent. Moreover, the available bits were exhausted after Unicode 
5.2, and we're now working on Unicode 11. Both Apple and Microsoft have started 
to use 'dlng' and 'slng' values in the 'meta' table of OpenType fonts to convey 
what a font can and is designed to support — a distinction that the OS/2 table 
never allows for, but that is actually more useful. (I'd also point out that, 
in the upcoming Windows 10 feature update, the 'dlng' entries in fonts is used 
to determine what preview strings to use in the Fonts settings UI.) For scripts 
like Latin that have a large set of characters, most of which have infrequent 
usage, there can still be a challenge in characterizing the font, but the 
mechanism does provide flexibility in what is declared.

But again, you haven't said what data to put into fonts is your issue. If you 
are trying to determine whether a given font supports a particular language, 
the OS/2 and 'meta' table provide heuristics — with 'meta' being recommended; 
but the only way to know for absolute certain is to compare an exemplar 
character list for the particular language with the font's cmap table. But 
note, that can only tell you that a font _is able to support_ the language, 
which doesn't necessarily imply that it's actually a good choice for users of 
that language. For example, every font in Windows includes Basic Latin 
characters, but that definitely doesn't mean that the fonts are useful for an 
English speaker. This is why the 'dlng' entry in the 'meta' table was created.



Peter

-----Original Message-----
From: Unicode [mailto:[email protected]] On Behalf Of Adam Borowski 
via Unicode
Sent: Saturday, February 17, 2018 2:18 PM
To: [email protected]
Subject: metric for block coverage

Hi!
As a part of Debian fonts team work, we're trying to improve fonts review:
ways to organize them, add metadata, pick which fonts are installed by default 
and/or recommended to users, etc.

I'm looking for a way to determine a font's coverage of available scripts. 
It's probably reasonable to do this per Unicode block.  Also, it's a safe 
assumption that a font which doesn't know a codepoint can do no complex shaping 
of such a glyph, thus looking at just codepoints should be adequate for our 
purposes.

A naïve way would be to count codepoints present in the font vs the number of 
all codepoints in the block.  Alas, there's way too much chaff for such an 
approach to be reasonable: þ or ą count the same as LATIN TURNED CAPITAL LETTER 
SAMPI WITH HORNS AND TAIL WITH SMALL LETTER X WITH CARON.

Another idea would be giving every codepoint a weight equal to the number of 
languages which currently use such a letter.

Too bad, that wouldn't work for symbols, or for dead scripts: a good runic font 
will have a complete coverage of elder futhark, anglo-saxon, younger and 
medieval, while only a completionist would care about franks casket or 
Tolkien's inventions.

I don't think I'm the first to have this question.  Any suggestions?


ᛗᛖᛟᚹ!
--
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ A dumb species has no way to open a tuna can.
⢿⡄⠘⠷⠚⠋⠀ A smart species invents a can opener.
⠈⠳⣄⠀⠀⠀⠀ A master species delegates.

RE: metric for block coverage

Reply via email to