Re: Parsing TTF in memory in FontBox 3.x

Tilman Hausherr Tue, 10 Feb 2026 11:09:08 -0800

Yeah FileSystemFontProvider.scanFonts() is time sensitive. It was reallyslow on machines with many fonts.Re parsing specific tables, the current code already needs many of thesetables.

Tilman

Am 10.02.2026 um 15:51 schrieb Daniel Gredler:

we could get UnitsPerEm because we read that table in

HeaderTable.readHeaders()

Do you need the complete "head" or "hhea" table?

Just parts of the hhea table (ascender, descender, line gap). I'm not sure
it's a good idea to start down this path, though, as there will always be a
few more fields that will be useful to someone :-)


Instead of an on-demand mode, what do you think of adding the option of
table-level granularity to the parse request, e.g. instead of:

parser.parse(buffer)

you could call

parser.parse(buffer, NamingTable.TAG,
HeaderTable.TAG, OS2WindowsMetricsTable.TAG, HorizontalHeaderTable.TAG)

to select the specific tables that you need ahead of time.

If FileSystemFontProvider.scanFonts() is very performance-sensitive, it
would probably need to continue using parseTableHeaders... but if there's a
little wiggle room, it might be able to use this table-level granularity as
well. Right now it's not only selecting specific tables to read, but also
the specific fields in each table that it wants to read (so even higher
granularity than just table-level).

Take care,

Daniel


On Tue, Feb 10, 2026 at 1:29 PM Tilman Hausherr<[email protected]>
wrote:

Am 10.02.2026 um 12:55 schrieb Daniel Gredler:

Hi Tilman,

It looks like I may need to go back to 2.x for now. The on-demand table
loading removed in PDFBOX-5460 (
https://issues.apache.org/jira/browse/PDFBOX-5460) was important for me,
and the header-only mode added in PDFBOX-5847 (
https://issues.apache.org/jira/browse/PDFBOX-5847), which might have

served

as a sort of replacement, doesn't include all of the information I'm
currently reading (e.g. units per em or horizontal header info), since

it's

very focused on just supporting the needs of
FileSystemFontProvider.scanFonts(). Is a more generic on-demand load mode
completely off the table for 3.0?

Re your last question, I'd prefer not to touch that part due to the
problems we had when it existed. But we could get UnitsPerEm because we
read that table in HeaderTable.readHeaders(). What do you mean with
"horizontal header info"? Do you need the complete "head" or "hhea"
table? That would be too much. It might be easier for you to fork that
code and delete everything you don't need and just get the values you
want. And then use the official fontbox for the actual work.

Tilman

Take care,

Daniel



On Mon, Feb 9, 2026 at 10:15 PM Daniel Gredler<[email protected]>

wrote:

Actually, it looks like a JIRA entry won't be necessary. Switching to
OTFParser seems to do the trick for this file.

Further, I think the difference in behavior was due to the disappearance
of the "parseOnDemand" option, which I was using (when on-demand is

enabled

in FontBox 2.x, this CFF validation doesn't run).

Thanks again for the pointers!

Daniel



On Mon, Feb 9, 2026 at 9:13 PM Daniel Gredler<[email protected]>

wrote:

That's odd. It has a ".otf" file extension, and FontBox 2.x didn't seem
to have any issues with it.

I'll create a JIRA issue with more information, since it sounds like
feature parity is expected here (and I should be able to attach the
offending file).

Take care,

Daniel


On Mon, Feb 9, 2026 at 8:57 PM Tilman Hausherr<[email protected]>
wrote:

Am 09.02.2026 um 20:43 schrieb Daniel Gredler:

Ah, got it -- thanks! This indeed fixed the EOF issue.

However, I'm now getting the following error: "True Type fonts using

CFF

outlines are not supported"

This is while reading the Noto Sans CJK Regular font file. Is this an

area

where FontBox 3.x functionality is more limited than FontBox 2.x was?

No, you should get the same exception in 2.0. If you don't have it as

ttf file, it should work as a ttc file, however the calls are

different

https://github.com/notofonts/noto-cjk/tree/main/Sans/OTC

https://github.com/notofonts/noto-cjk/tree/main/Sans

from the EmbeddedMultipleFonts.java example

TrueTypeCollection ttc2 = new TrueTypeCollection(new
File("c:/windows/fonts/batang.ttc"));

PDType0Font font2 = PDType0Font.load(document,
ttc2.getFontByName("Batang"), true); // Korean


TrueTypeCollection can take an inputstream.

However I see from that website that ttf files are also available, if
you know which country you need.


Tilman



---------------------------------------------------------------------
To unsubscribe,e-mail:[email protected]
For additional commands,e-mail:[email protected]

Re: Parsing TTF in memory in FontBox 3.x

Reply via email to