-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

All,

A simple tweak to the getFullUnicodeFont method to cache the loaded
font made a huge difference. The resulting file is now only 20% of the
original size when not embedding the same font over and over again.

Just so I have things sorted in my own mind: each font used will still
show on each page where it's used, right? In the "smaller" file, I can
still see the font mentioned on more than one page, but it's got the
same "CID" and the same font name ("AAAROV+ArialUnicodeMS" -- no more
"AAA???+ArialUnicodeMS" coming up multiple times with slightly
different names).

Of course, I'm also seeing the Type1 fonts show up repeated on
multiple pages as well -- that's normal, right?

Thanks,
- -chris

On 5/16/19 16:06, Christopher Schultz wrote:
> Tilman,
> 
> On 5/16/19 12:17, Tilman Hausherr wrote:
>> PDFDebugger.
> 
>> Look at the resources. If the same font occurs several times,
>> then you did something wrong. It should occur only once in a
>> document.
> 
> Okay, it looks like it is indeed showing multiple times. Here's
> what I can see in the document:
> 
>> Page 1 Contents MediaBox Parent Resources (1) [8 0 R] Font (12)
>> [15 0 R]
> F1 (6) [19 0 R] /T:Font /S:Type0  (AAAGXI+ArialUnicodeMS) F10 (4)
> [28 0 R] /T:Font /S:Type1 (Times-Italic) F11 (6) [29 0 R] /T:Font
> /S:Type0 (AAABJI+ArialUnicodeMS) (9 more listed: 3 total type 1
> fonts, 9 total type 0 fonts including those above) The font
> AAA???I+ArialUnicodeMS shows up for all of the "type 0" entries .
> 
>> Page 2 [...] Resources Font (3)
> F1 (4) [20 0 R] /T:Font /S:Type1 (Times-Roman) F2 (6) [31 0 R]
> /T:Font /S:Type0 (AAAYGI+ArialUnicodeMS) F3 (4) [28 0 R] /T:Font
> /S:Type1 (Times-Italic)
> 
>> Page 3 [...] Resources Font (2)
> F1 (4) [20 0 R] /T:Font /S:Type1 (Times-Roman) F2 (4) [28 0 R]
> /T:Font /S:Type1 (Times-Italic)
> 
>> Page 4 [...] Resources Font (2)
> F1 (4) [20 0 R] /T:Font /S:Type1 (Times-Roman) F2 (4) [28 0 R]
> /T:Font /S:Type1 (Times-Italic)
> 
> So perhaps I am even using the built-in fonts incorrectly if they
> are being mentioned on every page. Or is each page which uses a
> font expected to have its own Font entry in the resources?
> 
> Does this mean I am "adding" the font too many times somehow?
> 
> My code looks like this:
> 
> private void writeWrappedText(PDFont font, int fontSize, String 
> text, Color color) throws IOException { int paragraphWidth = 500; 
> boolean indented = false;
> 
> String strippedText = sanitizeString(text); int start = 0; int end
> = 0; int wrappedLineCnt = 1;
> 
> if(!isAnsiEncoding(strippedText)) { if(logger.isDebugEnabled()) 
> logger.debug("Text contains non-ansi characters: " + text);
> 
> font = getFullUnicodeFont(); }
> 
> for ( int i : getPossibleWrapPoints(strippedText) ) { float width
> = font.getStringWidth(strippedText.substring(start,i)) / 1000 *
> fontSize; if ( start < end && width > paragraphWidth ) { if
> (wrappedLineCnt == 1) setOffsetX(getOffsetXforMargin()); 
> printSanitizedLine(font, fontSize, 
> strippedText.substring(start,end), indented ? _pageIndent : 0,
> color); wrappedLineCnt++; start = end; } end = i; } if
> (wrappedLineCnt == 1) setOffsetX(getOffsetXforMargin()); // Last
> piece of text printSanitizedLine(font, fontSize, 
> strippedText.substring(start), indented ? _pageIndent : 0, color); 
> }
> 
> The getFullUnicodeFont method is:
> 
> private PDFont getFullUnicodeFont() { if(null == _doc) throw new
> IllegalStateException("Document has not yet been created; cannot
> load a new font");
> 
> InputStream in = null; try { String fullUnicodeFontFile =
> "/resources/fonts/ARIALUNI.TTF" ; in =
> getClass().getResourceAsStream(fullUnicodeFontFile); if(null ==
> in) throw new MissingResourceException("Cannot load font file " +
> fullUnicodeFontFile, this.getClass().getName(), 
> fullUnicodeFontFile);
> 
> PDFont font = PDType0Font.load(_doc, in);
> 
> return font; } catch (IOException ioe) { throw new
> RuntimeException("Cannot load font", ioe); }
> 
> }
> 
> Re-reading that code, it's obvious that I should be storing the
> font once loaded and re-using it. I'm guessing that 
> PDType0Font.load(PDDocument,InputStream) doesn't recognize that
> the font has already been loaded and just adds it a second (or
> third, etc.) time. Can anyone confirm that?
> 
> I know that my code isn't the best in terms of only choosing to
> render certain glyphs in this "full" font. I am working to improve
> that, and I know there is example code for choosing the "best" font
> for each character in a string, which I'll be reviewing
> separately.
> 
> Thanks, -chris
> 
>> Am 16.05.2019 um 18:09 schrieb Christopher Schultz: All,
> 
>> We have a process that generates PDF documents usually using the
>>  default Type-1 built-in fonts, so the documents do not embed
>> the font information.
> 
>> We recently added the ability for the documents to include font 
>> information if certain glyphs were not available in the default 
>> font(s) and, as expected, the file sizes end up being bigger
>> when that happens.
> 
>> What is the best tool to look at a particular document to see
>> why it ended up being so large? I'm not sure I can visually tell
>> by looking at the document which character triggered the
>> inclusion of the font, and then why that font was used for what I
>> can only assume was a lot of text. By inspecting the file, I'm
>> sure I can improve my code so that we have fewer uses of this
>> additional font and therefore keep the file sizes to a minimum.
> 
>> Thanks, -chris
>>> 
>>> --------------------------------------------------------------------
- -
>>>
>>>
>
>>> 
To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>> 
> 
> 
>> ---------------------------------------------------------------------
>
>> 
> 
> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
> 
> 
> ---------------------------------------------------------------------
>
> 
To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlzdyDoACgkQHPApP6U8
pFi+/g//Vpbt1+fgt2MGCoVgXXJKKfFGRa6rd1M+/V7klJGlgWPFBiF5GVxrlYTi
uXvUQx6/3eqSc59/EWoECprP7HcAiVKnr4ji6x5weylb053TYGydQu5vSzzFeDRs
/RWu/2hiIv1vPhdIidFDNwzwnz0f1ZjCCMIgLikJw4ezsr6DLrWpt/tfLy6J889s
x05ep3yxljFhTsyELwDACVDLUzqEovSYOfjczDq4kZc99OLxp6hz37w1bo0xo3DH
PzNIKJiUvByT36hs2sEUgpKuPOBzy4n8JeOXVY9YzDBNlCv/DpKv9ecVk9VfOCFb
9Du7wBUBvGbCmbEDlKbHqBeYWmtl++ors1cT8helGx8djtWFBiV59Jauh5OA/qzZ
mRDCQK08uuLZDQ6F7pelwlnleIIrJdz5ccSK5JuTUTcKXZt+Hpk/lKB58lBiySgF
vl7WVFHncuQT1VxbLbjqKlO8ehoyt7DiMzKCl/hpwEiLlSlD3pX0pwstkGV8MlyQ
VvtUh5Crw6lVPjjI/g8ReldzVstzV1C7U+VexRbPYy/eCrK0RavQJWTrKe7SMt4j
wognlbSi+r8AEXXupiudzF4uyqbJo6frFFacKktqqz6Vi81qFPIIIrIJcXC7vTbf
7T65KAOIgDWGECqSPzW57Ql5y3a/UefMUagQDCHUQk8hY7q7bCs=
=m3yA
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to