Hi!

First off, thanks for an excellent library. It's been a pleasure to work with something where the many mysteries of PDFs are tucked away behind an easy-to-use interface.

What I'm working on is filling forms. In particular, large forms into which we sometimes need to put a variety of languages. This has mostly worked fine, but I've noticed a speed issue. As an example, take the US I-130: https://www.uscis.gov/system/files_force/files/form/i-130.pdf

If I go through this and fill every field with roman text using the default font, it takes circa 2 seconds, which is fine. If I fill it with an added Arabic font, it takes circa 7 seconds. And if I use a CJK font, it takes circa 140 seconds, which seems like a lot. This is with PDFBox 2.0.14 and the Oracle 1.8.201 JDK on Linux.

One interesting symptom when I do this is that I see this error message every time I fill a text field:

   Apr 05, 2019 8:16:51 AM
   org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
   INFO: OpenType Layout tables used in font NotoSansCJKsc-Medium are
   not implemented in PDFBox and will be ignored
   Apr 05, 2019 8:16:51 AM
   org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
   INFO: OpenType Layout tables used in font NotoSansCJKsc-Medium are
   not implemented in PDFBox and will be ignored
   Apr 05, 2019 8:16:52 AM
   org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
   INFO: OpenType Layout tables used in font NotoSansCJKsc-Medium are
   not implemented in PDFBox and will be ignored


When I step through this in the debugger, I see the font being cached in pdmodel.lPDResources.java:155. However, next time through the loop, the cache is empty, as the PDResources object involved is a different one. Is that the intended behavior?

One thing that might be related is that the default resources for a form don't seem to be saved even when set. For example, this code:

   System.out.println("acroForm.getDefaultResources() = " + 
acroForm.getDefaultResources());
   System.out.println("acroForm.getDefaultResources() = " + 
acroForm.getDefaultResources());
   System.out.println("acroForm.getDefaultResources() = " + 
acroForm.getDefaultResources());
   PDResources defaultResources = acroForm.getDefaultResources();
   acroForm.setDefaultResources(defaultResources);
   System.out.println("set default resources to " + defaultResources);
   System.out.println("acroForm.getDefaultResources() = " + 
acroForm.getDefaultResources());
   System.out.println("acroForm.getDefaultResources() = " + 
acroForm.getDefaultResources());
   System.out.println("acroForm.getDefaultResources() = " + 
acroForm.getDefaultResources());


will produce this output:

   acroForm.getDefaultResources() =
   org.apache.pdfbox.pdmodel.PDResources@36d64342
   acroForm.getDefaultResources() =
   org.apache.pdfbox.pdmodel.PDResources@39ba5a14
   acroForm.getDefaultResources() =
   org.apache.pdfbox.pdmodel.PDResources@511baa65
   set default resources to org.apache.pdfbox.pdmodel.PDResources@340f438e
   acroForm.getDefaultResources() =
   org.apache.pdfbox.pdmodel.PDResources@30c7da1e
   acroForm.getDefaultResources() =
   org.apache.pdfbox.pdmodel.PDResources@5b464ce8
   acroForm.getDefaultResources() =
   org.apache.pdfbox.pdmodel.PDResources@57829d67



I definitely am new to the code base, but that isn't what I expected.

If this is a bug with the cache, I'm glad to take a swing at fixing it, but I figured I'd make sure that I wasn't doing something egregiously wrong first.

Thanks,

William


Reply via email to