Hi!
First off, thanks for an excellent library. It's been a pleasure to work
with something where the many mysteries of PDFs are tucked away behind
an easy-to-use interface.
What I'm working on is filling forms. In particular, large forms into
which we sometimes need to put a variety of languages. This has mostly
worked fine, but I've noticed a speed issue. As an example, take the US
I-130: https://www.uscis.gov/system/files_force/files/form/i-130.pdf
If I go through this and fill every field with roman text using the
default font, it takes circa 2 seconds, which is fine. If I fill it with
an added Arabic font, it takes circa 7 seconds. And if I use a CJK font,
it takes circa 140 seconds, which seems like a lot. This is with PDFBox
2.0.14 and the Oracle 1.8.201 JDK on Linux.
One interesting symptom when I do this is that I see this error message
every time I fill a text field:
Apr 05, 2019 8:16:51 AM
org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
INFO: OpenType Layout tables used in font NotoSansCJKsc-Medium are
not implemented in PDFBox and will be ignored
Apr 05, 2019 8:16:51 AM
org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
INFO: OpenType Layout tables used in font NotoSansCJKsc-Medium are
not implemented in PDFBox and will be ignored
Apr 05, 2019 8:16:52 AM
org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
INFO: OpenType Layout tables used in font NotoSansCJKsc-Medium are
not implemented in PDFBox and will be ignored
When I step through this in the debugger, I see the font being cached in
pdmodel.lPDResources.java:155. However, next time through the loop, the
cache is empty, as the PDResources object involved is a different one.
Is that the intended behavior?
One thing that might be related is that the default resources for a form
don't seem to be saved even when set. For example, this code:
System.out.println("acroForm.getDefaultResources() = " +
acroForm.getDefaultResources());
System.out.println("acroForm.getDefaultResources() = " +
acroForm.getDefaultResources());
System.out.println("acroForm.getDefaultResources() = " +
acroForm.getDefaultResources());
PDResources defaultResources = acroForm.getDefaultResources();
acroForm.setDefaultResources(defaultResources);
System.out.println("set default resources to " + defaultResources);
System.out.println("acroForm.getDefaultResources() = " +
acroForm.getDefaultResources());
System.out.println("acroForm.getDefaultResources() = " +
acroForm.getDefaultResources());
System.out.println("acroForm.getDefaultResources() = " +
acroForm.getDefaultResources());
will produce this output:
acroForm.getDefaultResources() =
org.apache.pdfbox.pdmodel.PDResources@36d64342
acroForm.getDefaultResources() =
org.apache.pdfbox.pdmodel.PDResources@39ba5a14
acroForm.getDefaultResources() =
org.apache.pdfbox.pdmodel.PDResources@511baa65
set default resources to org.apache.pdfbox.pdmodel.PDResources@340f438e
acroForm.getDefaultResources() =
org.apache.pdfbox.pdmodel.PDResources@30c7da1e
acroForm.getDefaultResources() =
org.apache.pdfbox.pdmodel.PDResources@5b464ce8
acroForm.getDefaultResources() =
org.apache.pdfbox.pdmodel.PDResources@57829d67
I definitely am new to the code base, but that isn't what I expected.
If this is a bug with the cache, I'm glad to take a swing at fixing it,
but I figured I'd make sure that I wasn't doing something egregiously
wrong first.
Thanks,
William