Re-using font objects
All, Is it possible to re-use a PDFont object? We have a situation where we are building many PDFs in a single process and encountering OOMEs running out of heap space. We are using a "large" (relatively speaking) TrueType font (ARIALUNI.TTF) whose on-disk representation is 22 MiB. We are loading the font into each document like this: PDType0Font.load(document, inputStream) This font is only loaded when we get a conversion error into the standard built-in fonts (we are trying to create the smallest PDF documents possible for several reasons). Every OOME stack trace I've seen includes this PDType0Font.load() call, so I was thinking that maybe we'd load the font a single time on startup and re-use it for every document which needs it, but I don't see any way in the API to do such a thing. Are PDFont objects possible to re-use? Even theoretically? It would be really great if we could do something like this: static PDFont bigFont; static { bigFont = PDType0Font.load(null, inputStream); } public void generateDocument() { ... PDFont localBigFont = document.addFont(bigFont); PDPageContentStream content = ...; content.setFont(lodalBigFont); ... } I'm currently using PDFBox v2.0.8 but upgrading to a later version should not represent too much of a problem. -chris - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Examples of word-wrapping
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hello, It occurs to me that my code is doing more work than necessary to print paragraphs, and that maybe manual word-wrapping is not necessary. Are there any examples that show how to print a paragraph of text without having to compute the width of text and manually chop things up into lines? Or is that part of the price of PDF? Thanks, - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlzevQQACgkQHPApP6U8 pFiu9Q//YpMB9rLYl69Gm/lFY2kEk10hWt1aDSMxB6sw3xjQAqO05P/2X5JiljoV YlS1yTbVF9agAxk3l45X4aTubA7ae7oBxNPsIl98AvK3fxzHHlloCNOjc6y2Tdq2 7TxVNEJAqDNaWmHRENaNaqz7I+II3iQTDSu1ycf/MIYCd7sT3SnnlIgzO06E9SNi a/AiFEgrce5NobfoAt/wZYfTY6ydY+xYWFifZgp3hqWpNBx1BigCvmFs45AnLHm3 rt3Qbsn1Q94m+SivsCMprhVDtNFESgE+5yLrQPtOXVMJrmNKTChgVK4VnDgc8UEz hN+JFxGtjCuV/SgypWOmv7aar2o10o0AJyhk6zjA0YBTIHaBeqLzIT64sl9I00iv Cs+Hex8btPvDDycRDIRE76C+ZS1+obdyIf4nbpfEfDQwDjhRHtiGn0M/QXL/se8L DRz1Vk/8z3IJ26DGulxOb9X1g2GHn7WlPfqZogQMawQtfIBexobWkCZnaQ17ew7R CXK2pReJ0JZLu6VnmfZWlHEcdQK5ZubSErvBjIO4qMZTzanQUXhTSwRw5vh2hbOG w3oWvF1DPXTRg+bbsW8UsyaR8CFBpXZg/bE2AjUkXIzoYTujekwe9VGdoufZCqlu 0eGtT9A41PbhilyWh+gA/D2o+dcWTNPWBLrqDYrbfq0TIVYu1fc= =SUN1 -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Re: Determining why a PDF is large
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 All, A simple tweak to the getFullUnicodeFont method to cache the loaded font made a huge difference. The resulting file is now only 20% of the original size when not embedding the same font over and over again. Just so I have things sorted in my own mind: each font used will still show on each page where it's used, right? In the "smaller" file, I can still see the font mentioned on more than one page, but it's got the same "CID" and the same font name ("AAAROV+ArialUnicodeMS" -- no more "AAA???+ArialUnicodeMS" coming up multiple times with slightly different names). Of course, I'm also seeing the Type1 fonts show up repeated on multiple pages as well -- that's normal, right? Thanks, - -chris On 5/16/19 16:06, Christopher Schultz wrote: > Tilman, > > On 5/16/19 12:17, Tilman Hausherr wrote: >> PDFDebugger. > >> Look at the resources. If the same font occurs several times, >> then you did something wrong. It should occur only once in a >> document. > > Okay, it looks like it is indeed showing multiple times. Here's > what I can see in the document: > >> Page 1 Contents MediaBox Parent Resources (1) [8 0 R] Font (12) >> [15 0 R] > F1 (6) [19 0 R] /T:Font /S:Type0 (AAAGXI+ArialUnicodeMS) F10 (4) > [28 0 R] /T:Font /S:Type1 (Times-Italic) F11 (6) [29 0 R] /T:Font > /S:Type0 (AAABJI+ArialUnicodeMS) (9 more listed: 3 total type 1 > fonts, 9 total type 0 fonts including those above) The font > AAA???I+ArialUnicodeMS shows up for all of the "type 0" entries . > >> Page 2 [...] Resources Font (3) > F1 (4) [20 0 R] /T:Font /S:Type1 (Times-Roman) F2 (6) [31 0 R] > /T:Font /S:Type0 (AAAYGI+ArialUnicodeMS) F3 (4) [28 0 R] /T:Font > /S:Type1 (Times-Italic) > >> Page 3 [...] Resources Font (2) > F1 (4) [20 0 R] /T:Font /S:Type1 (Times-Roman) F2 (4) [28 0 R] > /T:Font /S:Type1 (Times-Italic) > >> Page 4 [...] Resources Font (2) > F1 (4) [20 0 R] /T:Font /S:Type1 (Times-Roman) F2 (4) [28 0 R] > /T:Font /S:Type1 (Times-Italic) > > So perhaps I am even using the built-in fonts incorrectly if they > are being mentioned on every page. Or is each page which uses a > font expected to have its own Font entry in the resources? > > Does this mean I am "adding" the font too many times somehow? > > My code looks like this: > > private void writeWrappedText(PDFont font, int fontSize, String > text, Color color) throws IOException { int paragraphWidth = 500; > boolean indented = false; > > String strippedText = sanitizeString(text); int start = 0; int end > = 0; int wrappedLineCnt = 1; > > if(!isAnsiEncoding(strippedText)) { if(logger.isDebugEnabled()) > logger.debug("Text contains non-ansi characters: " + text); > > font = getFullUnicodeFont(); } > > for ( int i : getPossibleWrapPoints(strippedText) ) { float width > = font.getStringWidth(strippedText.substring(start,i)) / 1000 * > fontSize; if ( start < end && width > paragraphWidth ) { if > (wrappedLineCnt == 1) setOffsetX(getOffsetXforMargin()); > printSanitizedLine(font, fontSize, > strippedText.substring(start,end), indented ? _pageIndent : 0, > color); wrappedLineCnt++; start = end; } end = i; } if > (wrappedLineCnt == 1) setOffsetX(getOffsetXforMargin()); // Last > piece of text printSanitizedLine(font, fontSize, > strippedText.substring(start), indented ? _pageIndent : 0, color); > } > > The getFullUnicodeFont method is: > > private PDFont getFullUnicodeFont() { if(null == _doc) throw new > IllegalStateException("Document has not yet been created; cannot > load a new font"); > > InputStream in = null; try { String fullUnicodeFontFile = > "/resources/fonts/ARIALUNI.TTF" ; in = > getClass().getResourceAsStream(fullUnicodeFontFile); if(null == > in) throw new MissingResourceException("Cannot load font file " + > fullUnicodeFontFile, this.getClass().getName(), > fullUnicodeFontFile); > > PDFont font = PDType0Font.load(_doc, in); > > return font; } catch (IOException ioe) { throw new > RuntimeException("Cannot load font", ioe); } > > } > > Re-reading that code, it's obvious that I should be storing the > font once loaded and re-using it. I'm guessing that > PDType0Font.load(PDDocument,InputStream) doesn't recognize that > the font has already been loaded and just adds it a second (or > third, etc.) time. Can anyone confirm that? > > I know that my code isn't the best in terms of only choosing to > render certain glyphs in this "full" font. I am working to improve > that, and I know there is example code for choosing the "best" font > for ea
Re: Determining why a PDF is large
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tilman, On 5/16/19 12:17, Tilman Hausherr wrote: > PDFDebugger. > > Look at the resources. If the same font occurs several times, then > you did something wrong. It should occur only once in a document. Okay, it looks like it is indeed showing multiple times. Here's what I can see in the document: > Page 1 > Contents > MediaBox > Parent > Resources (1) [8 0 R] > Font (12) [15 0 R] F1 (6) [19 0 R] /T:Font /S:Type0 (AAAGXI+ArialUnicodeMS) F10 (4) [28 0 R] /T:Font /S:Type1 (Times-Italic) F11 (6) [29 0 R] /T:Font /S:Type0 (AAABJI+ArialUnicodeMS) (9 more listed: 3 total type 1 fonts, 9 total type 0 fonts including those above) The font AAA???I+ArialUnicodeMS shows up for all of the "type 0" entries . > Page 2 > [...] > Resources > Font (3) F1 (4) [20 0 R] /T:Font /S:Type1 (Times-Roman) F2 (6) [31 0 R] /T:Font /S:Type0 (AAAYGI+ArialUnicodeMS) F3 (4) [28 0 R] /T:Font /S:Type1 (Times-Italic) > Page 3 > [...] > Resources > Font (2) F1 (4) [20 0 R] /T:Font /S:Type1 (Times-Roman) F2 (4) [28 0 R] /T:Font /S:Type1 (Times-Italic) > Page 4 > [...] > Resources > Font (2) F1 (4) [20 0 R] /T:Font /S:Type1 (Times-Roman) F2 (4) [28 0 R] /T:Font /S:Type1 (Times-Italic) So perhaps I am even using the built-in fonts incorrectly if they are being mentioned on every page. Or is each page which uses a font expected to have its own Font entry in the resources? Does this mean I am "adding" the font too many times somehow? My code looks like this: private void writeWrappedText(PDFont font, int fontSize, String text, Color color) throws IOException { int paragraphWidth = 500; boolean indented = false; String strippedText = sanitizeString(text); int start = 0; int end = 0; int wrappedLineCnt = 1; if(!isAnsiEncoding(strippedText)) { if(logger.isDebugEnabled()) logger.debug("Text contains non-ansi characters: " + text); font = getFullUnicodeFont(); } for ( int i : getPossibleWrapPoints(strippedText) ) { float width = font.getStringWidth(strippedText.substring(start,i)) / 1000 * fontSize; if ( start < end && width > paragraphWidth ) { if (wrappedLineCnt == 1) setOffsetX(getOffsetXforMargin()); printSanitizedLine(font, fontSize, strippedText.substring(start,end), indented ? _pageIndent : 0, color); wrappedLineCnt++; start = end; } end = i; } if (wrappedLineCnt == 1) setOffsetX(getOffsetXforMargin()); // Last piece of text printSanitizedLine(font, fontSize, strippedText.substring(start), indented ? _pageIndent : 0, color); } The getFullUnicodeFont method is: private PDFont getFullUnicodeFont() { if(null == _doc) throw new IllegalStateException("Document has not yet been created; cannot load a new font"); InputStream in = null; try { String fullUnicodeFontFile = "/resources/fonts/ARIALUNI.TTF" ; in = getClass().getResourceAsStream(fullUnicodeFontFile); if(null == in) throw new MissingResourceException("Cannot load font file " + fullUnicodeFontFile, this.getClass().getName(), fullUnicodeFontFile); PDFont font = PDType0Font.load(_doc, in); return font; } catch (IOException ioe) { throw new RuntimeException("Cannot load font", ioe); } } Re-reading that code, it's obvious that I should be storing the font once loaded and re-using it. I'm guessing that PDType0Font.load(PDDocument,InputStream) doesn't recognize that the font has already been loaded and just adds it a second (or third, etc.) time. Can anyone confirm that? I know that my code isn't the best in terms of only choosing to render certain glyphs in this "full" font. I am working to improve that, and I know there is example code for choosing the "best" font for each character in a string, which I'll be reviewing separately. Thanks, - -chris > Am 16.05.2019 um 18:09 schrieb Christopher Schultz: All, > > We have a process that generates PDF documents usually using the > default Type-1 built-in fonts, so the documents do not embed the > font information. > > We recently added the ability for the documents to include font > information if certain glyphs were not available in the default > font(s) and, as expected, the file sizes end up being bigger when > that happens. > > What is the best tool to look at a particular document to see why > it ended up be
Determining why a PDF is large
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 All, We have a process that generates PDF documents usually using the default Type-1 built-in fonts, so the documents do not embed the font information. We recently added the ability for the documents to include font information if certain glyphs were not available in the default font(s) and, as expected, the file sizes end up being bigger when that happens. What is the best tool to look at a particular document to see why it ended up being so large? I'm not sure I can visually tell by looking at the document which character triggered the inclusion of the font, and then why that font was used for what I can only assume was a lot of text. By inspecting the file, I'm sure I can improve my code so that we have fewer uses of this additional font and therefore keep the file sizes to a minimum. Thanks, - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlzdi0MACgkQHPApP6U8 pFgi1BAAontGI4xIb2LSkueFR1NeIeoDUzrMTF2r+3136G3gTmX+dvfKjaH5eHjw qa2Nl5Z7GflszlPqSuGtSjHkKA+fejSUj9DfHx55Uef89lTjPJGh7r7Y15yr0nu0 oI595m25IjP6QUsA//uHknjcazuGEyjJS8M3ractEUukwQJmVCgdpXjCjca5Bc+5 Vhxp+iim4Vsv8Enckc2f5MFmFSTTj+Gi5qhM1m1vxyrTis2np1/mUVtlFgH50/Nx WS2WvIv9RKmnx0Wo0SvrhpwSlJ1pDbU8bbx0lvLXBuyyPzQ6KdHpw++onBleA6Nb bM+Axs9r5sMWjWhCX5vKLMcQN7jZU/yDYLAPDNI0a5pPFWyG7xbRDwnQo0fLu0vC E4N5RbFxbjyKdBAA4LVggfEjE5kdDCUL0utH38RaFu2XUTcTjrZUXh9hylqVb4xl i1Mdenq8gUsMvldxR1DCoQTDCuxzAa+tB3JxDt7E6XnrOtIqgJdryl1wruCtHkVT UL71AMHvc7MCbzE2wS6582kjilCWVRZkYph0UPbLFDZ7PDdvVSukYI8erXhRS/Eu SvFLLOmKuc/OQSUAiVEgj9d52+IsvQzEsiSYET/77JIAO+yKmnvFUm0KiK3WKSHG 36KcSkX9EWJeXx4XUcRnbkvX2ypsTaqqN7xnklAL4ohnouBmFFc= =46RA -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Re: Choosing a font for non-ASCII characters
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tilman, On 3/20/19 03:55, Tilman Hausherr wrote: > Am 19.03.2019 um 22:08 schrieb Christopher Schultz: Tilman, > > On 3/19/19 16:23, Tilman Hausherr wrote: >>>> Am 19.03.2019 um 19:45 schrieb Christopher Schultz: Tilman, >>>> >>>> So I'm starting to look toward making my code better now that >>>> it's actually working. Right now, my code looks like this: >>>> >>>> if(!isAnsiEncoding(strippedText)) { font = >>>> getFullUnicodeFont(); } >>>> >>>> Where one font simply replaces the other for strings that >>>> aren't available the the built-in font(s). >>>> >>>> I'd like to support emoji and stuff like that. I can find a >>>> font (or fonts) for that, but I think the only way I can do >>>> that with the existing API is something like this: >>>> >>>> Font[] fonts = new Font[] { builtIn, arialUnicode, emoji }; >>>> >>>> for(Font font : fonts) { try { page.setFont(font); >>>> page.showText(text); } catch (IllegalArgumentException iae) { >>>> // Try the next font } } >>>> >>>> That will "work" but it will not work if, for example, I need >>>> to print text that includes both Chinese characters (from >>>> arialUnicode font) and also emoji (from the hypothetical >>>> "emoji" font). >>>> >>>> If there any way to tell PDFBox to "pick the right font (from >>>> some list) for each character"? >>>> >>>> >>>>> No, that is why I created the EmbeddedMultipleFonts.java >>>>> example which I mentioned earlier in the thread. That one >>>>> can switch within strings. > Right, it basically does the same thing as I have above, but for a > bunch of increasingly-widening substrings, and it uses exceptions > for flow control. Yuck. > > I'd have to look more into what PDFont.encode does, but I'm > guessing that it wouldn't be too hard to build methods into the > PDFFont class that look something like this: > > /** * Returns true if this PDFont can render the whole string. */ > public boolean canEncode(String s); > > /** * Returns the longest String that can be successfully encoded > by this * PDFont, beginning at the beginning of {s}. If the whole > String {s} * is encodable, then {s} will be returned. If only a > part of {s} * is encodable, then the return value of this method > will be such that: * * > s.startsWith(getLongestEncodablePrefix(s)) == true * * * If the > first character of the string is not encodable in this PDFont, * an > empty string (or null?) will be returned. */ public String > getLongestEncodablePrefix(String s); > > >> That would just push what you called "Yuck" further downwards, or >> we would have to maintain code twice, one for checking whether >> something can encoded, and one for actually doing it. And this >> for all the 6, maybe 7 font types. Code reuse? >> Instead of going forward with your project with the working code >> provided, you're arguing about design issues. You are operating under the impression that I haven't already modified my own code to work. I have. I'm volunteering to help improve your product. You don't have to get so upset when someone offers help. - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlySVXQACgkQHPApP6U8 pFgBRQ/+NR6U1Btl12Oof9fM4tn77UNUgQ7qVPmrsW4ev/He1J/TlqNXcxUGhnG6 ZYZYlrjCmzLQ9oB2mMqfuG55gN/FPziYZwegVDFiU1O/40Rsdan1aW5BQnM14qWN z1+kBW0awOABdguMvpwjsMaGpxVFBMdMeHsxVQmmMD8LozOOuI2yJBEvCna8mvqS iFiPUC53sIxdTAKvnFZHIUoDYLlXTuuwd28gbJSDC+6G6YpeF+aRBqUj0vqc2bfk 9abJ4BZYOztysPrc/NWE97HBLxsYIhROZGsdVUTVhs8VgBsdzG7qXg9UhrWzTYPy YdtrldUFxb1WuJ/UQZZIPlAikPwlbI6S45Hzy1YlnBkWa8vqR4f0QLh3X458Zzxc /ZF+CbKaNe/BWDkBANZANmUf1TjArnIQp5jo4QsYgq2m6BfTbLeMfYDTRap92NpA M3kJQ0fU8gl39VWKk6DubeOWdkD+o/BusN/gOpg4z3YINH2TeHIf1w1u6k+lsg6B fGg4e71Hg556LkuT5eDgChXfMj35PXOVJ6qnhM+HZ2Z2bgY3U+bV/Hnrk9bKOVFa MlHPt48V/M1/AuTJ4PLBjXp9XNak0vxIRI0YMaUnQ3oZZgabVkG0SPAsdrYwEGuZ cQyMPMciLQIjQcExVGVwtaUD+ooMDAfQMHHRb9qeBJ0c/E30ung= =QRFg -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Re: Choosing a font for non-ASCII characters
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tilman, On 3/19/19 16:23, Tilman Hausherr wrote: > Am 19.03.2019 um 19:45 schrieb Christopher Schultz: Tilman, > > So I'm starting to look toward making my code better now that it's > actually working. Right now, my code looks like this: > > if(!isAnsiEncoding(strippedText)) { font = getFullUnicodeFont(); } > > Where one font simply replaces the other for strings that aren't > available the the built-in font(s). > > I'd like to support emoji and stuff like that. I can find a font > (or fonts) for that, but I think the only way I can do that with > the existing API is something like this: > > Font[] fonts = new Font[] { builtIn, arialUnicode, emoji }; > > for(Font font : fonts) { try { page.setFont(font); > page.showText(text); } catch (IllegalArgumentException iae) { // > Try the next font } } > > That will "work" but it will not work if, for example, I need to > print text that includes both Chinese characters (from arialUnicode > font) and also emoji (from the hypothetical "emoji" font). > > If there any way to tell PDFBox to "pick the right font (from some > list) for each character"? > > >> No, that is why I created the EmbeddedMultipleFonts.java example >> which I mentioned earlier in the thread. That one can switch >> within strings. Right, it basically does the same thing as I have above, but for a bunch of increasingly-widening substrings, and it uses exceptions for flow control. Yuck. I'd have to look more into what PDFont.encode does, but I'm guessing that it wouldn't be too hard to build methods into the PDFFont class that look something like this: /** * Returns true if this PDFont can render the whole string. */ public boolean canEncode(String s); /** * Returns the longest String that can be successfully encoded by this * PDFont, beginning at the beginning of {s}. If the whole String {s} * is encodable, then {s} will be returned. If only a part of {s} * is encodable, then the return value of this method will be such that: * * s.startsWith(getLongestEncodablePrefix(s)) == true * * * If the first character of the string is not encodable in this PDFont, * an empty string (or null?) will be returned. */ public String getLongestEncodablePrefix(String s); WDYT? If this must be implemented initially by using exceptions for flow-control, so be it. But theoretically, it can be improved in the future by performing faster checks... possibly by each type of PDFont subclass in a different way. - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlyRWlwACgkQHPApP6U8 pFh9ThAAoHG1hK2SnjLv0ibDvZaG3ZI79NAgoIz7+bowPbi4BvPfKYfuubF0QSNH l2lvk657H+0PDFUU5UepyB4JsjItXKG3sgNbQBB0E+G84PF896M/3r61TMgTKmT4 1pEqkHMXJoBA/4/Gnh9HLMGyKTY623R60Jhgsxocm78KR4zSjiZuvLpWsSvrqC57 4vR4YZ8Od4FvC0NFiGrI4w7KCpRvhT15IiOS77Qitgm3CMTyDaOulcjrcQx2rk0B sZY5q+S2huG8INR2vqjjkA/iQjJOTvI7hGJco/PemKWZm6x0/NmATeA7bSYZ9FZ/ ylJgahUKyCh2b/iJG5oOl/7iuFKrBpeO95/KO0ETTgrM/dZLbNnvDqQsdAfBOZYv MTzqk36rf7vMUZtr4i9XW4la4tol5MZTidUGJBgryhaE4VQDrfsnpI3R78LKJA2a +QHVLGA5N/fnCyG9/sxX3dwr3+K4daZ56YZJrkaqoO/IU95eQu8sFdATI++4uwsm JcWGbmK6O7RiljwqrggTJaU49BuPgnj1+RbIxBkovGEM5ReITomqZn5wsUnowbiE jVxSAavZ7OU8TlT+/bjFKWoV+wTvzGad671vPxt/Dy+++BFiGScVDwLM8qVmcrd1 gf8BosKaVBHE/+YBw1wyYyYJowvrtr7T9gMMyIHG91fZiSv8Ml4= =6hcu -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Re: Choosing a font for non-ASCII characters
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tilman, So I'm starting to look toward making my code better now that it's actually working. Right now, my code looks like this: if(!isAnsiEncoding(strippedText)) { font = getFullUnicodeFont(); } Where one font simply replaces the other for strings that aren't available the the built-in font(s). I'd like to support emoji and stuff like that. I can find a font (or fonts) for that, but I think the only way I can do that with the existing API is something like this: Font[] fonts = new Font[] { builtIn, arialUnicode, emoji }; for(Font font : fonts) { try { page.setFont(font); page.showText(text); } catch (IllegalArgumentException iae) { // Try the next font } } That will "work" but it will not work if, for example, I need to print text that includes both Chinese characters (from arialUnicode font) and also emoji (from the hypothetical "emoji" font). If there any way to tell PDFBox to "pick the right font (from some list) for each character"? - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlyRONIACgkQHPApP6U8 pFhu3A//dObTICq7o17gNERfJKQg6dL4nFt8eHXTrw/NZkrSzMtiyYttil+o8a5o y3bPDQ+Nvo2FofQBFCfq480mZh1Vo8MpVNKTitUISR/14zzNPSTNa+K08bfMMYhA 8El2EgGAv/v/xtn7xFLNowOjbq7r3Hap1wmYpwLVM1aqFYL4wS6QNwlkmIsOqocs JeeQ247g/KZHm4nJ9Z+b5Dd8vS/DpoOUzs9Yyt9APNHPRAjirevq37ALf46gowDj GHlIGLzjNDLDLUn6sCFES2SSScHt8und/RW6K5cEJsFmtc22cFZ9RpcpeRg4BkJh /VPDs8Iq1KzMUXWjlJTq5bWsbE8IMCtgSkYZt0Fl9FJOGrg9aIa6SjEHxZ3KsBht RHquj3vblGYrrn22t+G+oelIm94iiWfwsIf/wmOke2fcv83lEX5xVMtTKLB+uCQo 4wwMqgkuTQiMS8KH5BlR5WCMrmGhRq4fD3gZ1Sdt4TJXiKJuUOss5sQTdDgLIyvT jL29R79pCdnp1v90rxM2sFR3CPr/fjUZOcF1+vYKhXwyaYSFboaxCUwtFNoA+aLc mztEIRurYq6MParoIrELyGaqVnmOD/ElcPiRdbNSWkfa8xRcAjHqeFCjZe6qrTOD nkbAzhOG4Ty0hyI/v0zaaGvJ1lS40zzaCp0hHxDcd1td3JnUzs4= =paYu -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Re: Choosing a font for non-ASCII characters
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tilman, On 3/3/19 08:48, Tilman Hausherr wrote: >> I have no idea. The information about PDFBox seems to be mostly >> in example programs and not web-based documentation. Searching >> e.g. Google for "how to use FontBox with PDFBox" generally comes >> up with references into the Javadoc for "uses of FontBox >> interface". >> >> The Javadoc does not describe what FontBox is and none of the >> classes or subclasses in those related packages really have any >> documentation worth reading. Each class "foo" is described as >> "being a foo" and each "getBar" method is described as "gets the >> bar for the foo". >> >> So... discoverability of features is pretty much nil here. >> >> I'm quite happy with the responses I get on this mailing list, >> but it's nearly impossible to discover on my own what is >> possible, here. I shouldn't have to get you guys to tell me how >> to use the software... you have better things to do (like >> continue to write great software). >> >> Is there a good example of using FontBox with PDFBox in order to >> subset a font? > > Yes, the EmbeddedFonts.java example. I don't see any use of FontBox in the EmbeddedFonts.java example. Am I missing something? > We are a small team and don't have the time to write tutorials. > There are many working examples and also many answers on > stackoverflow. Understood. > You don't need fontbox unless for advanced things, e.g. reuse a > font for several files. For normal use cases, fontbox remains under > the hood. > > If you think some class documentation is useless, name it, and I'll > see if it can be improved. :) It's less of a presence of useless documentation and more of a lack of existing documentation. I can file some tickets if you think it would be helpful. I also don't mind writing documentation and/or tutorials for the project. > The subset thing is done by PDFBox without you having to bother > about it. It's "not subsetting" that would require more parameters. > So you need only this: > > PDType0Font font = PDType0Font.load(document, new > File("c:/windows/fonts/arial.ttf")); stream.setFont(font, 12); > stream.showText("..."); Okay, that's exactly what we are doing (well... we are loading the font via the ClassLoader, but ...). And it's working. I was just a little worried about the ballooning file size. I realize there is little to be done about that at this stage. At this point, I am basically doing this: [ When adding text to the document ] - - If the text contains anything outside of the ANSI encoding - then replace the usual (default) font with the ARIALUNI.TTF It operates on a per-text-string basis, so it should only change the font for a single piece of text that requires it. I'm starting to think that I should not bother scanning the text and instead use the IllegalArgumentException as flow-control -- which I still don't like. But it means that my code will not spend a ton of time repeating checks that PDFBox will end up doing, anyway. I'm a little worried about what I will do the next time I have an issue like this -- where the ARIALUNI.TTF font doesn't include some character that I need... since there's no way to probe a font for support for a code point, I can't map code-points to fonts in a scalable way. It will just be trial-and-error which is no fun. It also means that I need to have some kind of set of fonts that we just round-robin through, hoping we get a hit and we can continue... otherwise we just have to fail (like we do now). - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlx9gDYACgkQHPApP6U8 pFia0w/+LSFIJCLtol+WZDMpcjTxI1Y4ulUFmRJxd+ZdGzbCrKss2R3p+J6VGZ0w SZWAUQqg48FoVu4kh3fp4j9mz9eqprF9rmZiEPqGJKtsUPnpMTd3SA6Xt2eucY3O VMOEbsy66/wC3DwgIgQdrrDfuRWsvmLkE6WyvkJpf1+sDIgFkSoD57y3YpHQdB4/ o6+WXg1FSVjQAiND/XYAGZUHmV2o5JGFJVJJNlnmC6m11j/0zZvv4ZS1v3NX4DS1 n9cwHtTEUxcz73AGzUo9A0QLfsPgEMEF8akbaLfA4UekZ0lZLCFXA36aP62KaI6b ICo1/qF7eEOC1XpdCZS2JWpjMQn83q2kvuIooTEyHXjOT8t27f0+455e3PgYuLkh kV9xMutmkJxXKv5VO3ohTmDWydQiwt/90M9ToTKonGeYWXTEEWzHpHr6BD95/2rZ +yAbY3S0vTb1J0uQmlDaK6dd1pU+SSMxIV6Gi1tYi1kMVboiiQAMxJ9eqEhjt21+ W3x4oGPLUoJ6q1TSTh0BOnXVnEUeci/Srbp+GWXvhmXtVC5H9V6dggb94yaKI3nC KLW+87OYaU+Pd4GQNMI+2KipGAbeQ/8OhHEq63cFoKLzhKk/V/50w3Bo9/CLGyZ3 W0E7lAZWV5cnu/AoKHC9KdSIPf+Qn6c//CtDmyWbjAr8g1yOzZc= =TScO -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Re: Choosing a font for non-ASCII characters
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tilman, On 3/2/19 10:00, Tilman Hausherr wrote: > Am 02.03.2019 um 15:54 schrieb Christopher Schultz: >> Is there a good way to probe text to determine whether or not an >> alternate font will be necessary and only load/bundle it then? > > From the new EmbeddedMultipleFonts.java example (in the source > code download): > > > boolean isWinAnsiEncoding(int unicode) { String name = > GlyphList.getAdobeGlyphList().codePointToName(unicode); if > (".notdef".equals(name)) { return false; } return > WinAnsiEncoding.INSTANCE.contains(name); } > > > When that one returns true, you can use the built-in fonts. Okay, I see that. Is there any reason not to do this? boolean isWinAnsiEncoding(int unicode) { return WinAnsiEncoding.INSTANCE.contains(unicode); } ? Is there nothing like PDFont.isSupportedCodePoint(unicode) available? I didn't see anything. It looks more like the standard way to check is t o: try { page.showText(text); } catch (IllegalArgumentException iae) { page.setFont(alternateFont); page.showText(text); } If that's SOP, then maybe there is no real reason to bother checking whether the String will work in the first place... just try it and try again if the operation fails? Catching IllegalArgumentException seems ugly, though. Maybe PDFBox could subclass IllegalArgumentException with something more narrow like IllegalCodePointException and throw that instead? It would be backward-compatible and also one could determine the root cause without parsing the exception message to see what the problem was. I'm happy to provide a patch. - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlx72rsACgkQHPApP6U8 pFho0Q//cH4SX5tWsb/JX782EJ622/h3XCumnrWuMT/yiunSyinsd26Jz3tquxU9 /tL9hZ8a57j20dKoqf5vm8EorlpYBrSgNAOjlRxuKqY2CLdnA9EsWX9Uux7R5PjF FUeE8yKGRyycUBazfNm0Ijv4oZt7A26/irmZrKUwbx73gbIxJMggFGQoMiAWMwgM hoX4MeJiBdxmJYf/XnHVZJs1LBX9pDnizIHEU26/bK7B2wb3H2+PSWe4TKf0eb7v n1UVjX+12U+CzlF9kx4AnMSDaTo3zmCxSQbzygOqVmaQsc2yAk7mksb7Tt79JzZ/ s1aatZRtmLEuRhbrF8knt3oWlat4Z1KKQD/Onol3pX+CQ/vKVmFgp9TLBitkiOm+ CZC949jfg3386akxeixQxBNLxMoo826NYfNLzKb6x0rYSnz4mgqyrvEPzEw/CltT Sn7Fo5RSvMH1aCa45KoPmQzCE0okUQN74XaqGaob6pFuerlHcYxhS/DefP+QtO93 ZRxWyGMJMw81+AEk7eIBeLVxh4gTCdA2bOJwR4I4n5oJZi0VCXOLy8p6wBlQrvDx rtRhcHW/HidVeiOeQ9kYoEDqAbg6Rvc4Wi/TkM0LxgeV0d/D9YW+gUWFw3NyiiNk IONjKQBxKpowgzXsq0Ug/DcKGu/Za7De9tp0jD5MVZU9i3e96Ag= =bMpZ -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Re: Choosing a font for non-ASCII characters
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 John, On 3/2/19 10:16, John Logan wrote: > Christopher, is the font that you don’t want to embed a Type 1 > font, or a TrueType font? I'm using PDType0Font.load to load the font, but the file is TT: ARIALUNI.TTF: TrueType Font data, digitally signed, 20 tables, 1st "DSIG", name offset 0x161c2f0 > If the latter, could you use Fontbox to subset the font and keep > the file size small? I have no idea. The information about PDFBox seems to be mostly in example programs and not web-based documentation. Searching e.g. Google for "how to use FontBox with PDFBox" generally comes up with references into the Javadoc for "uses of FontBox interface". The Javadoc does not describe what FontBox is and none of the classes or subclasses in those related packages really have any documentation worth reading. Each class "foo" is described as "being a foo" and each "getBar" method is described as "gets the bar for the foo". So... discoverability of features is pretty much nil here. I'm quite happy with the responses I get on this mailing list, but it's nearly impossible to discover on my own what is possible, here. I shouldn't have to get you guys to tell me how to use the software... you have better things to do (like continue to write great software). Is there a good example of using FontBox with PDFBox in order to subset a font? - -chris >> On Mar 2, 2019, at 7:00 AM, Tilman Hausherr >> wrote: >> >> Am 02.03.2019 um 15:54 schrieb Christopher Schultz: >>> Is there a good way to probe text to determine whether or not >>> an alternate font will be necessary and only load/bundle it >>> then? >> >> From the new EmbeddedMultipleFonts.java example (in the source >> code download): >> >> >> boolean isWinAnsiEncoding(int unicode) { String name = >> GlyphList.getAdobeGlyphList().codePointToName(unicode); if >> (".notdef".equals(name)) { return false; } return >> WinAnsiEncoding.INSTANCE.contains(name); } >> >> >> When that one returns true, you can use the built-in fonts. >> >> Tilman >> >> >> >> - >> >> >> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org >> For additional commands, e-mail: users-h...@pdfbox.apache.org >> > > > > - > > > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org > -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlx72L8ACgkQHPApP6U8 pFieHQ/8CGlWfRwCNzFdZOLIz/bgquqSsVlGMSYwdMv3+Ytl5WJ8vvJj1az/YNVE yyIXVKWWVa1aQiEMX+wEXZIhcLX1YROireFYkC6IwQaCjlfLtPTPopjwehVTfnN7 M5Fk23Rfge+Eths9alRm82hLgoKnYO70bYWfAWeYXokjPUXcQokfyG7N3CkWYaZa Ljt8fihDGbk266v7wPwbiRef58F3NW1EfSFV4J8qFr/bOiLZsRXGY2UXe4/k6Fxn qGSMqnV76CwWWXSYp4saKG0kAija37huAooYhksWAOO12WPJbOtCVD3C6veS/R8M RFXOb9z9uT/yratN7KGDxuWKT28YXaoFPzJfLwx1ZOiDZCK3E39xG8d7/dqiAFrb Edc4mBxK0wz9Ew6B1zReOG3d3kP7ksYEUsMwtLltfz4LSj17dzTuWaMCV5EQ0FRx 8oFm7xiPXBNwA8tNj/+US81jGV2u2pwxcKUi8LEygJzp7qjw5RsIQMrXUq450NWE LKIPqUE3I8iIpCqST1IX6qMSKgUpYyKi9nTxjMXIjNL6j9kA91fzsZLluBRm2vCs +jAgcVRImSrQ2wa0ZFtTEf3xQpEorkELgN1KhVkVRLllkisVmdqY026z7KfWwwP7 YsKBs6Si/ZOrDQO5gxlzXZIcE8AO54X7vh5V+IfKVsN+n6fwW9E= =jXC9 -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Re: Choosing a font for non-ASCII characters
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Andreas, On 1/31/19 01:27, Andreas Lehmkuehler wrote: > the standard pdf font (PDType1Font.HELVETICA et. al.) don't > support anything else than (limited) latin1. You have to use > something else. > > Have a look at the HelloWorldTTF example [1]. It shows how to embed > a true type font. You have to choose a suitable font from your OS > or something like the noto fonts from google. > > W.r.t. font embedding. It's always a good idea to embed all > resources which are needed to render a pdf. PDFBox reduces the > amount of space as it limits the embedded font to the used > characters. Thanks for the pointers. I'm finally getting around to doing something about this. I used "Arial Unicode" as referenced in a quickie online tutorial[1] and what I'm finding is that: 1. The Chinese characters render correctly (yay!) 2. My English-only file has gone from ~1k to ~18k This test file was the simplest I could muster so it's really an unfair comparison at this point. But it's clear that the file will get bigger (of course) by adding the font. I'd like to avoid bundling the font unless it's necessary. For several months, we've been able to get away with the standard PDF default fonts (which, presumably, the PDF spec requires all clients to provide which is why the files can be so small). Is there a good way to probe text to determine whether or not an alternate font will be necessary and only load/bundle it then? Thanks, - -chris [1] http://www.kscodes.com/java/write-chinese-pdf-using-apache-pdfbox/ > Am 31.01.19 um 02:56 schrieb Christopher Schultz: Hello, > > We are using PDFBox to generate PDFs in a very simple way and only > including fonts available from the PDType1Font class (e.g. > PDType1Font.HELVETICA). The PDFs we are generating are really only > including a few title/subtitles, text, and bulleted/numbered > lists. > > Everything is fine when we use what is probably in the standard > Latin alphabet, and we've had some troubles with special characters > that don't fit in there such as ≥ and ≤. We've dealt with that by > simply replacing "≤" with "<=" and so on, but we're starting to use > languages that don't use Latin script and so we can no longer > replace out way out of the problem. > > For example, I need to be able to put Chinese characters into a PDF > we generate. So let's take the text "中國" which is just the word > "China" in Traditional Chinese script. > > First, how can I find out that the character isn't going to fit > into the font that I'm currently using? Should I do it for every > character we try to put into the page, or should we just catch > exceptions when we try to write the text to the page and then scan > at that point? I'm trying to avoid writing hideously inefficient > code to handle these situations. > > Second, once I know that I need to choose another font... how do I > know which font to choose? Should I keep a mapping of Unicode code > point ranges and the best fonts to use for them? > > Finally, what fonts are actually available to PDFBox? How do I add > new ones? I have a lot of control over the environment and I get to > see failing conversions and intervene, so some trial and error is > okay for each new situation. > > The recipients of our PDFs are file-size sensitive, so I'd only > want to include (bundle) a font in a PDF if it was absolutely > necessary to include the font itself. If we can get away with > including a *reference* to the font in the PDF and telling these > recipients "sorry, if you want to read the Chinese PDFs we send, > you'd better make sure you have font X installed" then that's okay > with me, too. > > What suggestions to people have for doing all of the above? > > Thanks, -chris >> >> - >> >> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org >> For additional commands, e-mail: users-h...@pdfbox.apache.org >> > > > - > > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org > -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlx6mRwACgkQHPApP6U8 pFjWYQ/+JqfHbkkJ4ML+uxduY4PIJqY7u+FC1lsbVvbVjIhi1rLCQRuNDUWnpkmz bSfwCoDOevamegryFFxH/I4Ok+v8TXmBUEnAeEOFtHGlWHDuNXcijxmlFRKdpjIi MFzqv8t+4+YY6dS4KyHr4+fhj57sSqRkGVrKAYANonx3z/nEn/X7PqOnY1seDrEJ QGB/09y36+58E6TI+65resE181nvYFcw5kqchFWIjziwH654gldLQCojZ15GS5+/ PylDx5f6n/pxPYJLX940zEDjfqR4FCQryuzo1Yf3xM96c1IMYJbViv/LWrz+lQnc +7PPK99oV
Choosing a font for non-ASCII characters
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hello, We are using PDFBox to generate PDFs in a very simple way and only including fonts available from the PDType1Font class (e.g. PDType1Font.HELVETICA). The PDFs we are generating are really only including a few title/subtitles, text, and bulleted/numbered lists. Everything is fine when we use what is probably in the standard Latin alphabet, and we've had some troubles with special characters that don't fit in there such as ≥ and ≤. We've dealt with that by simply replacing "≤" with "<=" and so on, but we're starting to use languages that don't use Latin script and so we can no longer replace out way out of the problem. For example, I need to be able to put Chinese characters into a PDF we generate. So let's take the text "中國" which is just the word "China" in Traditional Chinese script. First, how can I find out that the character isn't going to fit into the font that I'm currently using? Should I do it for every character we try to put into the page, or should we just catch exceptions when we try to write the text to the page and then scan at that point? I'm trying to avoid writing hideously inefficient code to handle these situations. Second, once I know that I need to choose another font... how do I know which font to choose? Should I keep a mapping of Unicode code point ranges and the best fonts to use for them? Finally, what fonts are actually available to PDFBox? How do I add new ones? I have a lot of control over the environment and I get to see failing conversions and intervene, so some trial and error is okay for each new situation. The recipients of our PDFs are file-size sensitive, so I'd only want to include (bundle) a font in a PDF if it was absolutely necessary to include the font itself. If we can get away with including a *reference* to the font in the PDF and telling these recipients "sorry, if you want to read the Chinese PDFs we send, you'd better make sure you have font X installed" then that's okay with me, too. What suggestions to people have for doing all of the above? Thanks, - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlxSVeMACgkQHPApP6U8 pFgQew/8CS1YmJs27QrD+WGV/Zcn2RAeG/ZVs5w3huMwKLY8NfXQ4Vdp3o+s+B7u 2wn9m2LJVXuWT2dfDDQzZDIfBgfqZI5sl4+hBDSos9gEVV3ddWcox1A0YSTCy5VW DAlDZSscEdIDyMIVz2E1dQi6/p35MrSyJ/Xom6Tbnvt3ZHAp87GHZ1rB8XXrtVZS itVE756hJ59o4tZJoM9cH1NH1w9PuLLJyrGpCsc1oTgcZTI0jXxiIC9Q4GvLbLbO yVdExITzTVflLAo0BRGOJkb5IF1OyVf51HHas1+DMEvtSXY5J89e1dFnyo1dFxMU MXJ5rKh/FQvJtC5Lf9QoQ3tV8r3qyWv0wc8FVgMcLUA9DHbx7QtcydQwoKf3poJz ymlOJWH2b4d5uLbSfdjr9Nof4IRNH504cwjoth3eor3Ra/SCaem2ZrTQhY6XzoF1 vCpZChDIKzDvI7NDGbcaNvzzezNmlbdRdh3Ekwk1E/vwfrmtb4VmW7sW9PICP1o6 80sqydy6qIMtQNjr1EK55VIvD4+e10SwYWhcZinsByQkYZpoRjKWQ9kTNk10vvwk cLB8bVeLPHC7nLe4FqJe4y3+hWBfGP25O2VdnNU1sjd4lbzQhNIgCMj0n+6ziDuU Nh9vDuKRXEIIXHZUxrN2Td3hOw96wKHqEQ8RtxYpuGWABx4wIWw= =aMPi -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Re: PDFBox JPEG2000 and Tomcat - Revisited
Joel, On 8/1/18 3:54 PM, Joel Hirsh wrote: > However, your comment that "Tomcat doesn't unload libraries" is not > entirely true. There is an explanation of such symptions at > https://haraldk.github.io/TwelveMonkeys/, under the section "Deploying the > plugins in a web app". Thanks for posting that. Lemmie 'splain. In a servlet container, web applications (aka "contexts") each have their own ClassLoader and if the application is undeployed or re-deployed, then that ClassLoader will be freed for GC. If a class loaded by that webapp's ClassLoader is still in use, you have a memory leak situation knows as a "pinned ClassLoader". It happens all the time when you use a system-global cache such as ImageIO does. This can happen with JDBC DriverManger and a handful of other common APIs and it can be a serious problem for servers where there are lots of redeployments. You can also start to see errors like "ClassCastException: Cannot cast foo.bar.Class to foo.bar.Class" because instanceof-ness is defined not just by Class but also by ClassLoader. The IIOProviderContextListener mentioned in that documentation is one strategy to handle the unfortunate way ImageIO caches classes. Another strategy is to put the library into a place where it is accessible to all web applications instead of just one. > Just putting jars in a shared folder did not help. Here is where asking on the Tomcat mailing list *will* be a good idea. Make sure to give your exact Tomcat version and explain where you put the library. Can you post the *entire* stack trace from the failure? It looks like it's incomplete -- it doesn't seem to have a "true root cause". It just says "Could not initialize class" because "Could not initialize class". Somewhere there should be a real root cause like NullPointerException or something that can actually be debugged. Def post that to the Tomcat mailing list as well if you post. -chris signature.asc Description: OpenPGP digital signature
Re: PDFBox JPEG2000 and Tomcat - Revisited
Joel, On 8/1/18 1:38 PM, Tilman Hausherr wrote: > Am 01.08.2018 um 17:42 schrieb Joel Hirsh: >> And what appears to be the same error is back. Running one JPEG2000 >> image >> is fine, but at some point I get the error >> >> java.lang.NoClassDefFoundError: Could not initialize class >> org.apache.pdfbox.jbig2.JBIG2ImageReader >> at > > > JBIG2ImageReader is for JBIG2 images, not for JPEG2000 images. > > I suggest you ask the same on the tomcat mailing list, maybe they can > help... sadly I don't know more than last time. ... and before you do, just know that Tomcat doesn't unload libraries... -chris signature.asc Description: OpenPGP digital signature
Re: How can a java class load a static pdf file in WebLogic 12c?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Shawn, On 4/19/18 1:05 PM, shawn.oplin...@gmail.com wrote: > > > On 2018/04/19 14:41:18, Christopher Schultz > <ch...@christopherschultz.net> wrote: Fabio, > > On 4/19/18 10:26 AM, Fabio Salvi wrote: >>>> Hallo Shawn >>>> >>>> I use something like this: >>>> >>>> InputStream resourceAsStream = >>>> getClass().getClassLoader().getResourceAsStream("/META-INF/pdfforms /" >>>> >>>> + *aPDFFormularName*); >>>> >>>> PDDocument pdfDocument = >>>> PDDocument.*load*(resourceAsStream); >>>> >>>> This inside an EJB but I believe it will work for a WAR as >>>> well > > Yes, it will work inside of anything where the ClassLoader can get > to the file. Sometimes it makes more sense to use the servlet's > getResource() method because it may have a wider selection of > storage locations to consult. > > A few things: > > 1. Make sure to close the InputStream in a finally block. Resource > leaks are serious business on servers like this. > > 2. Watch out for your memory settings. You may end up loading a > *huge* PDF into memory without realizing it. > > 3. Remember that ClassLoader.getResourceAsStream can return null. > > -chris > >>>> 2018-04-19 15:13 GMT+02:00 shawn.oplin...@gmail.com < >>>> shawn.oplin...@gmail.com>: >>>> >>>>> >>>>> I need to load a static PDF document, from a java class, >>>>> running in my J2EE web app on WebLogic 12c; however >>>>> although my code works in Tomcat, when trying to run it in >>>>> WebLogic 12c (WebLogic Server Version: 12.2.1.2.0), I get a >>>>> server error that the PDF file cannot be found ( >>>>> java.io.FileNotFoundException). >>>>> >>>>> I am using Apache's PDF library, PDFBox version 2.0.8 to >>>>> load a fillable PDF file that I created, and then populate >>>>> that fillable PDF with data. My code works fine in Tomcat, >>>>> but fails to find the pdf file when deployed to WebLogic >>>>> 12c . >>>>> >>>>> -This appears to be because when an EAR file is deployed >>>>> to WebLogic 12c, the contents in the WAR file (all of the >>>>> application code/files, including the fillable PDF file), >>>>> remain archived up in a jar file that WebLogic creates, >>>>> instead of exploded. >>>>> >>>>> My application utilizes the standard Maven application >>>>> structure, so as is standard with all static files, I have >>>>> put my PDF file in the directory for static resources: >>>>> src/main/resources/ >>>>> >>>>> In my pom.xml file, I have the following, which builds any >>>>> pdf files in the /src/main/resources/ folder, into the >>>>> class path root folder of the WAR file. >>>>> ${basedir}/src/main/resources/ >>>>> **/*.xml >>>>> **/*.properties >>>>> **/*.pdf >>>>> >>>>> When I build the WAR and EAR file, the pdf file does indeed >>>>> get copied into the root folder of the application's class >>>>> files. >>>>> >>>>> The following 3 lines of code, work to load the PDF, when >>>>> my application's EAR file is deployed in Tomcat, but do not >>>>> in WebLogic 12c (WebLogic Server Version: 12.2.1.2.0). >>>>> >>>>> //this classLoader works for Tomcat, but no in WebLogic 12c >>>>> ClassLoader classLoader = getClass().getClassLoader(); >>>>> File file= new File(classLoader.getResource(" >>>>> myPdfFile.pdf").getFile()); PDDocument document = >>>>> PDDocument.load(file); >>>>> >>>>> WebLogic 12c produces the following error: >>>>> >>>>> >>>>> <[ACTIVE] ExecuteThread: >>>>> '6' for queue: 'weblogic.kernel.Default (self-tuning)'> >>>>> <> <> >>>>> <3902331f-a214-42fe-a6a1-35b3531e4b56-00a9> >>>>> <1523985710798> <[severity-value: 8] [rid: 0] >>>>> [partition-id: 0] [partition-name: DOMAIN] > >>>>> <[ServletContext@1661988196[app:mmhsrp-ear-4.9.0.1-3 >>>>> module:/mmhsrp path:null spec-version
Re: OutOfMemoryError in PDExtendedGraphicsState#getLineDashPattern
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Andreas, On 3/20/18 5:35 PM, Andreas Hubold wrote: > I'm getting an OutOfMemoryError from PDFBox when parsing a certain > PDF using the Apache Tika App v 1.17 - which uses PDFBox 2.0.8 > internally. This is reproducible even with 8GB heap. > > The OutOfMemoryError happens in > org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState#getLi neDashPattern, > > which contains this piece of suspicious code: > > COSArray dp = (COSArray) dict.getDictionaryObject( COSName.D ); if( > dp != null ) { COSArray array = new COSArray(); dp.addAll(dp); > > The last line seems to wrong? That certainly looks wrong to me. > It appends all elements from 'dp' to 'dp' again, effectively > duplicating the elements in the list. Maybe it should be > 'array.addAll(dp)' or something like that? > > Can you confirm this being a bug? Should I open a JIRA ticket for > this problem? > > Do you know a workaround to avoid the crash, e.g. an option to skip > some parts of the file for text extraction? - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqxgDYdHGNocmlzQGNo cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFj/Ew/7BqHbZpfLea7necmh zY6oOLIgLRwoarm61rWt8Kz6+Z+SGgU/8x5exQvJoZh8UhBG/sJ3OBIpdx5utMVM /XsvEj8k0CEMPLnvhq5D+akszJbfB3GWZgwZVdhUq6tMbWKPrXVqlJ4/boLBlWYY gOdkIkkULFuJtdk8rQ8GctbBmMnraSCyEvShLuuVOOi/m0MOMJnHIO6Ul6odWxWr gDLVsT4UXVb6G2fDDeTx9LkadOalAFDAbSNlH+MwI/uoA3L9o9Vs7Hz8LE5pt4ds ATBMS44hm+mk46t41VCD+dWP5adsJyZdzcZW+td0TUVGskeTHGfQ1uqDbBlFWyyA n06sqi5xFnJvO/nCAl8lX0P8xPhJG1xi1/oF4vHAr3LzwxELE5U5oV+l2Qk06Sdc RUNMuEyruiDlxj0Xm4xOnyy0X08RWjIp0XPyYW7DpGNIFxd+Wq/RC2ybUtSi2Ek7 2b5bd4rvk1jXdkEoBol/UB2rhNYDQUyqNPwU1ManA1coaHhqPRpDo8j4J0+ika9p +qsdsgRqOu5oIzBHE8uLnW+ViuAuuFDNGySWgbxdelrARXGj/1MgTaFqQUKjNwHg qFdZ9P29Kwv+oqQvJdkPpre9YoP2EJI49gV5EBakerM5/6BY+4wV03pNhtwoSL0r tr/qb0cGpzAr+2kKZsohQYDjEa0= =OFd7 -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Re: Strategy for dealing with non-latin characters in base font
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tilman, On 3/10/18 6:26 AM, Tilman Hausherr wrote: > Am 09.03.2018 um 21:57 schrieb Tilman Hausherr: >> You can subset fonts by using PDType0Font.load(), this limits >> memory usage. > > I meant "file size". > > Also important: try to use each font only once per file. You can > reuse fonts and images within a file. See also > > https://stackoverflow.com/questions/48377121/pdfbox-generates-pdf-a-fi le-of-very-large-size Okay, > we don't have any calls to any font's load() method... we are simply using PDType1Font.CONSTANTS for our various fonts (HELVETICA_* and TIMES_*). Will that give roughly the same behavior? We don't seem to be having any problems with file size (yet). I'm not even really sure what our options are for fonts; our usage of PDFBox is quite simplistic. Let's say I want to be able to render a character such as ® or ≤ in a PDF. How do I go about discovering which (available) font contains the proper mapping, and then ensure that font's glyphs are included in the PDF? Is it better to pick a font for the whole document that includes that glyph, or is it better to change the font for a single character, just to get the right display? - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqmgDsdHGNocmlzQGNo cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFheLhAAyMi4Mtw9poaPuOUY LUh8lSpeEJmZSc84UQFLJiM/nmAvHheQEKFgR2+Y53ZNAUgfpKn83ieKsZwbrmWo S17b+Z3Ybw+m+I9KOeqhyVppL/p5KWdtCyyl/3lOdtIcYVOUmIpcN86lfkNXN4uB LMtTfqpYQhJnN5wU2rGbTXeh3M5H0JiesL8hzBrIxznGzDm6tTNNs3yvli0vDL2h X8VoT2tHwYQ+RXJXGDRew9wF6udySyp9XwseYX4NMZK/Lq3Ro8FJNdyICwyGdWhh MYzKHy3BUyeFw+3DAmeZ77I188+YgFed4ZiFnS29z25u8f9XTXZc5sjg+kot2HUG pPN/wutOGHCYwtpW0L7iaeFtw1xpc9CSllrEkPIYsOlS2gZCot3aU8fWU989ngJA NKMQ0+VRiXX2ToTeRmIKo02rkABAUD3nRJLuyHLbeH+QDaAT2GoZFBy4ZjnwQDWV kWDpxYNPJHR7otyISEfG/+a8+pU5TleRDBKiIcbV4Q+V+i69WA+ihsK9fyLWJuLs 5cZmYCAEjzRgRTin/PpLjvghLH4C91AhBP0NJTLHHVw2/7odfZmUYnfq899AZnsY qjf4k2Kli8rT696mFStlxuQSbu77v98yEe6kRuw5iF9mQkYI3qW7Fnynoj/oBib9 R8iaER3xluvWY8K7olGkkvhTKd8= =WALV -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Strategy for dealing with non-latin characters in base font
All, Like many other folks, we often run across the "[character] is not available in this font [font]" when we try to use some character such as ≥ or ≤ or ® or whatever. (We are trying to keep the generated PDF files as small as possible, so we'd prefer to use the simplest fonts possible.) For most of these characters, we have simply created a mapping in our PDF-generation wrapper code that does roughly this: string = string.replace("≤", "<="); string = string.replace("®", "(R)"); etc. (It's smarter and more efficient than that, but you get the idea.) We are starting to find more and more characters that aren't quite as decomposible as those shown above, such as ç (that's the c-with-hook you see in the word garçon). We have resorted to replacing them with just a plain-old 'c' for the time being. But I'm wondering if there is a better way to handle these. I'm perfectly happy to replace ® with (R) for the most part, but when we have these characters with no direct replacements (and our product includes support for emoji, etc.), is there a way to add a single glyph for a single character to a PDF? Something like: document.addCharacterGlyph('ç', image) ? Or is there a good way to determine what font contains a particular glyph such as ç? One of our goals is to keep the resulting size of these PDFs to a minimum, so we'd like some strategy that doesn't require including MBs of font information just on the off-chance that a string we need to put in there has such a character. Thanks in advance, -chris signature.asc Description: OpenPGP digital signature
Re: Using Pdfbox in android app.
Michał, On 3/2/18 9:05 AM, Michał Walawender wrote: > I am Android developer, and i would ask You question about Pdfbox usage in > app, which I am currently developing. I have chcecked license, but still I > am not sure, if I can use Pdfbox in my app with ads in? AL2 is a fairly liberal license. I don't see why you couldn't use PDFBox in an app that contains ads. > It is just app which will be placed in Google Play, and i want to monetize > it with ads inside. Can You explain me, if I can use Pdfbox, based on > current license, and do I have to place any disclaimers or something? No disclaimers are necessary, since your app is not a "derivative work". Attribution is not strictly necessary, but would be greatly appreciated by the project. IANAL, YMMV, etc. I'm curious... why would an Android app need to produce PDF files? -chris signature.asc Description: OpenPGP digital signature
Re: [MISC] Cannot use the web interface from " https://lists.apache.org/list.html?users@pdfbox.apache.org " , login does not work.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 All, I think you have to be in the ASF addressbook (e.g. a committer) in order to post from lists.apache.org. Not absolutely sure, though. - -chris On 2/2/18 3:24 PM, Tilman Hausherr wrote: > We can't help you there ask Apache infra. To contact them, > study this: > > https://apache.org/dev/infra-contact > > Tilman > > Am 02.02.2018 um 09:09 schrieb Serban Alexe: >> I cannot use the login interface from the web site to reply >> directly to email threads. >> >> I tried the "Sign in with Google" option, but the next screen >> hangs on forever (see attached screenshot) . Same behaviour in >> Firefox, MS Internet Explorer, and MS Edge. >> >> Any ideas ? >> >> Thanks. >> >> >> - >> >> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org >> For additional commands, e-mail: users-h...@pdfbox.apache.org > > > -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlp05FsdHGNocmlzQGNo cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFhKKg//YG9sZ84CX6OtEjVv GdyAzQufI2BtsCV5Xkp1YDnFo9mZ22gGs5zKz9l/gdA/2avR3/aobcbV04QoSNt4 ielA6PSV2i5VUCVSQlTc7LWMlp0caTDubLJ98D0KUxcveXRtui+jKyZApiaphgNu CSxgDUZHD6OKrwUcx0cJ+N5Bjfnc+VBM++0ZitwiujfN76c7irSXZ76GAI2zhTjQ dpkeBzxWbYqEuDIE2cd23Rumqzv+ovvQTccP/lJxjlI1mTwWWnH3vKBMVv9li/12 KyYuAVNG8KHCoEcb2kaHuBRFlcFgxeUy3Epy6jxpBBrLyAeheABIoh9e56SLFK4Z SsbPx5td1Qs6AwjnlzoaVSEh7zpp1OVriPb+HFdDj6OmgFz44FhYNINZrxVo85u2 fp8/xuIy/O8dkQFmL4HR4GkHKCeuNcR3HGveSWEGk/fAlp28tUIFhGIjV4titIb+ aEb2mC9M7HWRtKXYx7OvGMIud+Nd708HCG4/S12h96Zk9hjxKJ7WKEJ2tn9cPLyf S/du1YXKjsHcAd64XW/bm52h5m7gyPy/e7j1zN0fLnk+OhBFMFeCBN/lrM3GsPw6 s8X6ZsFMC5WiGzuBEVPKbulAmAaDZ8v1FFIb+YaQw1DELwCY3UcMfMWBuuieJToy +YgwqdcHICJKcDQPwdJ/1OdXz+A= =nBp+ -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Re: tips on pdf font size
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 To whom it may concern, On 8/23/17 7:01 AM, chitgoks wrote: > hi. > > if anyone has ideas regarding font size in pdf and the font size in > html, would like to ask some inputs > > i created a free text annotation using coordinates taken from > javascript code and with html font size and a fixed font helvetica > but the result is not the same when generated in pdf. > > using the html font size value to the annotation created results in > the text size being too big. > > so it is not in pixel when it comes to pdf but point. and there are > various things to consider when converting the equivalent point > font size from the pixel font size. > > what is the most common factor to convert the html font size to pdf > font size in point? What is the "font size" in the HTML document? What units are being used there? http://www.endmemo.com/sconvert/pixelpoint.php - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBCAAGBQJZnbI4AAoJEBzwKT+lPKRYj84P/3eFnoA5Jq26riTsdVaO4+tk SuEyRoX10YzWco7TIaavpRoZub/PgmLvm+iZQSUl7EDkSLFipBwZ2cpmrPBnqw7D gQ+ULZBydeWrt5k6rk4VdyHsCXG+m4oZcUHQ6UNn9UZerMHhDUIPKH2uYC+lfngM 6CaF3wwlzDRonporyTNPvyYatsemlpylwwaPiU8RCDR+6CPnecQ6TeauomiyvdhZ xpzhIMtt5H3NdbIF/0OzRDgw7C9BNfsTlEgJ7C4SXrqSLVc8p0p1Otzqu82NjqiR IhJAatAE9dIoyERvYvmExTc5j864xSo7GQdxEqoQBselEqUnNhR5eogxcag1QpCD KO6rgVqF0ClDAXtqgP9/l3B998rLCytKsh7YdHW5v1r+UnndOtbBSl+/dP8XD45V LERXjyGJVqGr5EOvlDv6uO72+nJsx1IqAdIMA+QHuZf+YxbDlCE4WtH5mB7KAtb/ qz4dXtlVvZJJEzrkXmoulQyigsHMBkqgEEC+Fx4nQoXW51SePsPKcbugP6K59jWM PEEkyKP6pRfcqklbl7IA+XHQ9ws5P+AYkAw7hg2OwiUqez/1jgtvN7Ltc2R2eV1A fMLlC6ABaYnQxKli6prBEPZ4Q5av1Y7SDk52BAQscz/5d1Mj5aoR3Cv0teyTIxhI qxcQIkHcfPnRi8E1mcAJ =RqbX -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Re: Limit PDF size
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tilman, On 8/2/17 2:28 AM, Tilman Hausherr wrote: > Am 02.08.2017 um 01:17 schrieb Christopher Schultz: Tilman, > > On 8/1/17 4:42 PM, Tilman Hausherr wrote: >>>> Am 01.08.2017 um 22:09 schrieb Christopher Schultz: Tilman, >>>> >>>> On 8/1/17 3:22 PM, Tilman Hausherr wrote: >>>>>>> The only thing that comes close to what you want is to >>>>>>> create your PDDocument with >>>>>>> MemoryUsageSetting.setupMixed(...) as parameter. >>>> So that we can buffer to disk if the in-memory representation >>>> gets too big? That sounds like a good approach, and probably >>>> the most useful to m e. >>>> >>>> It also appears that I can set a maximum in-memory limit >>>> like this: >>>> >>>> MemoryUsageSetting mus = >>>> MemoryUsageSetting.setupMainMemoryOnly(1 * 1024 * 1024); >>>> PDDocument doc = new PDDocument(mus); >>>> >>>>> Yes. Although this would mean you'd get an exception if you >>>>> use more. That's why I recommend the mixed one. You could >>>>> use the memory limit for stress tests, i.e. create the >>>>> "worst" possible file and see what you need. > I think I'm okay with an exception in these cases. As I said, our > PDFs only end up being a few kiB in size, so I've put a 1MiB cap on > the memory-only memory usage strategy for the time being. > > I'm curious about what's being constrained, here... does PDFBox > estimate its current memory-usage of various PD* objects in memory > and push to disk when that's exceeded, or does it just limit the > amount of memory that gets used when serializing out to a stream. > >> There is no estimate... it writes in the dedicated space and if >> it is full, it's either exception (if memory only) or writing to >> disk cache. I get that, but I want to understand exactly what things are "counted". >> Yes... it's mostly images, fonts and page content streams. So, if I write an image to the PDDocument, that "counts" towards the memory/disk limits? What about plain text? Or the PDPage/PDPageContentStreams? If I write 1000 pages of plain-text to the PDDocument object, will that "fill up" the limited-memory I have configured? Or does that memory limit only count when e.g. serializing to a "compiled" PDF file (or whatever the right terminology is)? >> [Using built-in fonts] is even better, because it doesn't use any >> additional space (and is faster too). Your application is a >> very simple one :-) Yes, we are just taking some raw information and exporting it as a PDF. We wanted something simple AND we wanted to have file sizes as small as possible. I have a related question about fonts, but I'll ask that in a separate thread. >> You really should worry about other things... choose one or >> many: climate change, russian hackers, terrorism, rising interest >> rates, traffic jams, heavy rain flooding your basement, people >> who don't wash their hands, whatever :-) Who says I'm not a Russian hacker/taxi driver/hedge fund manager who pours truckloads of water on people's houses without warning? - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBCAAGBQJZgdbbAAoJEBzwKT+lPKRYQqwQALgwGrOVs2imMdDSMmk9dwNm dFoVemVdXVnBvAdlF6tTosKIIKo9jDs2csxpcsOXnug/GytFGxR7NKWF8MRzBHf1 RulHwtarKxBRYef0jiPOOcrXLRxtQL3GWMEpQl+pZGIyBkh9DVvsUNnqQyaew3CF UiXMekauOq6yLpV3MSVFPF1Wh+jKKVwVG/3rhrEPFhuS22TSTyWbHkNlgky2PwJs wMt3tMJQpe3PWQtEWXETJc119n97mmt9RtifcuNIcCV9k8/+RO0U+NpXfZzYYS68 WTl9f60iHeMvrAkfiK1QpwgY6HvvyZRDuLAk+45kopJjR7tlHS6L20f3YsYAyYPK fWIrEm2glFOv4IPwbzQzP9DM7p3ti41i8E41zcvVT7hUv541tfEB/xdMTZgIA/m4 1Km1jp2Vdww1GyjT+llLswxnh+JAPQlk4lGippdxy7HLFzSqzS9XV2uBZdpQcpjx 28VuR/3idYtl7IVByQjtsNQsbuPApREqv45Za06Q08r2wXEgbc0mV6kuSTY9NyDQ OMD6Df+rbZlKRZyHFAXAGgTcRMdxhcYkW6UFr9XWlpPFi2jGnkIGW920p/9cFd7U o2nmRVDPJBPhqWtEG3702fU1FFvouqWH4isb3lsPxawY4mk8ixFzXQiU/9TiaOkF MUMfRfwBSW/ukkWPgj4t =+oM3 -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Re: Limit PDF size
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tilman, On 8/1/17 4:42 PM, Tilman Hausherr wrote: > Am 01.08.2017 um 22:09 schrieb Christopher Schultz: Tilman, > > On 8/1/17 3:22 PM, Tilman Hausherr wrote: >>>> The only thing that comes close to what you want is to create >>>> your PDDocument with MemoryUsageSetting.setupMixed(...) as >>>> parameter. > So that we can buffer to disk if the in-memory representation gets > too big? That sounds like a good approach, and probably the most > useful to m e. > > It also appears that I can set a maximum in-memory limit like > this: > > MemoryUsageSetting mus = MemoryUsageSetting.setupMainMemoryOnly(1 > * 1024 * 1024); PDDocument doc = new PDDocument(mus); > >> Yes. Although this would mean you'd get an exception if you use >> more. That's why I recommend the mixed one. You could use the >> memory limit for stress tests, i.e. create the "worst" possible >> file and see what you need. I think I'm okay with an exception in these cases. As I said, our PDFs only end up being a few kiB in size, so I've put a 1MiB cap on the memory-only memory usage strategy for the time being. I'm curious about what's being constrained, here... does PDFBox estimate its current memory-usage of various PD* objects in memory and push to disk when that's exceeded, or does it just limit the amount of memory that gets used when serializing out to a stream. >> Note that only streams are cached. Ordinary java structures (e.g. >> maps, numbers, strings) are not. Can you tell me a little more about that? When you say "streams are cached", what does that mean exactly? Or have I essentially already asked that question above? > ... and then this should enforce a 1MiB size limit, no? I think > that's all I want... there shouldn't be any reason for me to have > to touch the disk: my files are really quite small. I just don't > want something to go wrong with my client code and inadvertently go > into an infinite loop adding "Hello World" to the document over and > over until I have 50k pages in the PDF and an OOME on my hands. > >>>> What you should do is to care to not have anything duplicate. >>>> So if you have a company logo on every page, create your >>>> object object only once. Same for fonts. > We have something like: > > private Font _theFont; > > ... contentStream.setFont(_theFont); > contentStream.newLineAtOffset(x,y); contentStream.showText("Hello, > world"); ... > > > Many many times. The Font object reference stays the same, so I'm > guessing that's okay and the font is used once and referenced many > times, right? > >> Yes! > >> To create small PDF files, use PDType0Font.load() instead of >> PDTrueTypeFont.load(), this will subset the fonts after saving. We are using PDType1Font.FONTNAME for everything, so we aren't calling .load for anything at all. >>>> And try to have only one content stream per page. (We >>>> recently had a guy who had a huge number of content streams >>>> and wondered why his PDF was so big). > Check: we have only one PDPageContentStream per page. > > We have a single logo on the first page and nothing repeated. > > Our PDFs are almost 100% plain-text with lots of whitespace (which > doesn't count, I know). When base64 encoded, they are typically > only a few kb in size. > > I'm mostly operating from a position of borderline unhealthy > paranoia, but I'd rather have a bit of code added to ensure that I > don't have to get paged in the middle of the night to restart a > service that has suffered an OOME. > >> This all sounds harmless. All the memory problems I can think of >> were related to rendering, not PDF creation. Sounds good. >> We've had a least one speed complaint, but that one is solved in >> the current version. I'll make sure we are up-to-date. Thanks, - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBCAAGBQJZgQwMAAoJEBzwKT+lPKRYws4P/RvvC0+6xp5fMINPAey98Pj6 cxTSeAkm0RsLl9lZrCxBjVRHNGsKBd1G70fgFEp6uB+5tU14Na0m1nZZ2WNGtiko dwTseWL/m/FiggHDrzsT+RQVlbBoUzhBpyHYmEkRnbfQnS98eE0ZTSlN59IAStzn yD7jFEds/nJucJZk9O6so9lOa9waGMf+s2MEp1YfMizytuIRK4ch3JG5/cBVQa8S 2W3J/Y/fIQWXOAx433XuVG9rC00RKtaMJahjOwyhmUIznNlR/yGH+0iiqwziUyXX UtqsPTyFrGHQcHr4gaiewug6V//P5HC+XYhqyU0AR1EJolYSGXPY0UtRuTgCtAQ0 FXFjaYPppumKCjV9QMIfRcps7XclwoV/kiip5H3DIZwIL81PRE3rjthuE75uAjps OEtGWjte9DDfDkkV6gudp0DmCBWq6oMyw7m4vm7rLACPXt0ziZtEKU698N7m88T6 vFxLtZloUbGVj0UAe4Sr6e31fw+5+dp2gpFNgKSP8FBGWAGLA+6srSA9sucpsqev yG4QgReFNclDgO7i/6H5W1DcNZeTOwLJ+vT5BJafSvgHBGhLGy3F1uM3IyeFMgf7 XBHr4Em8p41aGS0BCvtGQ+xFMPCPKIHEvZxLZ+1JxboS0g5+KT8LHnCWvXjc6gSa w9Dyle4TNPUoJHp24k/p =YM5j -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Re: Limit PDF size
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tilman, On 8/1/17 3:22 PM, Tilman Hausherr wrote: > The only thing that comes close to what you want is to create your > PDDocument with MemoryUsageSetting.setupMixed(...) as parameter. So that we can buffer to disk if the in-memory representation gets too big? That sounds like a good approach, and probably the most useful to m e. It also appears that I can set a maximum in-memory limit like this: MemoryUsageSetting mus = MemoryUsageSetting.setupMainMemoryOnly(1 * 1024 * 1024); PDDocument doc = new PDDocument(mus); ... and then this should enforce a 1MiB size limit, no? I think that's all I want... there shouldn't be any reason for me to have to touch the disk: my files are really quite small. I just don't want something to go wrong with my client code and inadvertently go into an infinite loop adding "Hello World" to the document over and over until I have 50k pages in the PDF and an OOME on my hands. > What you should do is to care to not have anything duplicate. So if > you have a company logo on every page, create your object object > only once. Same for fonts. We have something like: private Font _theFont; ... contentStream.setFont(_theFont); contentStream.newLineAtOffset(x,y); contentStream.showText("Hello, world"); ... Many many times. The Font object reference stays the same, so I'm guessing that's okay and the font is used once and referenced many times, right? > And try to have only one content stream per page. (We recently had > a guy who had a huge number of content streams and wondered why his > PDF was so big). Check: we have only one PDPageContentStream per page. We have a single logo on the first page and nothing repeated. Our PDFs are almost 100% plain-text with lots of whitespace (which doesn't count, I know). When base64 encoded, they are typically only a few kb in size. I'm mostly operating from a position of borderline unhealthy paranoia, but I'd rather have a bit of code added to ensure that I don't have to get paged in the middle of the night to restart a service that has suffered an OOME. Thanks for the pointers. - -chris > Am 01.08.2017 um 20:04 schrieb Christopher Schultz: All, > > We use PDFBox on a server that must handle many transactions with > (somewhat) limited memory. I'd like to limit the amount of memory > used to generate our PDFs, which we then serialize to byte-array, > base64-encode, etc. for ultimate delivery to some endpoint. > > I can obviously limit the number of bytes produced by using a > size-limited OutputStream passed-into > PDDocument.save(OutputStream), but I'm wondering if PDFBox has any > facilities within it to limit the size of the object-tree in memory > (or estimate its size, and we can stop operations when it reaches a > certain size) so that we don't end up with a multi-GB object-tree > that then fails to serialize to byte[] because it is too big. > > We are building our PDF documents from scratch, starting with the > page definitions, fonts, etc. then adding titles, paragraphs of > text, etc. It's all fairly straightforward, and we have full > control over the whole process up to and including the call to > PDDocument.save(OutputStream). > > We are manually constructing our pages as well, so I suppose we > could simply limit the number of pages, but I'm more concerned > about the size of the memory used and not the number of pages. > > Is there anything in PDFBox that can help us with this? We can > always count e.g. the number of bytes/characters we have written to > the PDF, but that seems less important than what is going on inside > of the PDF structure itself. > > -chris >> >> - >> >> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org >> For additional commands, e-mail: users-h...@pdfbox.apache.org >> > > > - > > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org > -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBCAAGBQJZgN/2AAoJEBzwKT+lPKRYlLUQAK/eAna/kwigraXZ/ghwfB+U qe36r5yqUc9TMmCa7cunJuLJxMAnH6UnbNzNJm4IChMXmtLk++uF9YMKpPN0irQr RxAaNlUbNpnyJqXR/W/7ZTVo4gP2l7JYQqARcSLjxuROLqALF1jp8BoXMw0Zz8L4 rfEub/dVk3EIBvg+ithGeqzzb67yoPEbCP9LVsXoxyvrTER1mB28BmmSZsw2hVD5 HLKzmu3e4XLXdi+MKBfJfF0Y+S4/7/yq+4f0KBq/AD7VlNeUwOv6j0kiVkT5Tdv/ tJGtheC1M6dXVLqQD7/G/q37/kdgCeG12yTbpw8FUMbfn4yHrtd8Fqmxz6au8qpm Fu0xhGy1SobxiGXgpFCNED0fdGz0f56TYFPb8KgtAveHZuoPlDcyq9WdDThRl/zn Oxs1ytkFf4W0RbdNcR/wtQLxVUVbPUuNE5gFKqNf282H7fj5q/I3cyCmafUnecz0 bjcHfCS4EpciYnfJT1OihRGDGBXSHZfwXEqFva8h
Limit PDF size
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 All, We use PDFBox on a server that must handle many transactions with (somewhat) limited memory. I'd like to limit the amount of memory used to generate our PDFs, which we then serialize to byte-array, base64-encode, etc. for ultimate delivery to some endpoint. I can obviously limit the number of bytes produced by using a size-limited OutputStream passed-into PDDocument.save(OutputStream), but I'm wondering if PDFBox has any facilities within it to limit the size of the object-tree in memory (or estimate its size, and we can stop operations when it reaches a certain size) so that we don't end up with a multi-GB object-tree that then fails to serialize to byte[] because it is too big. We are building our PDF documents from scratch, starting with the page definitions, fonts, etc. then adding titles, paragraphs of text, etc. It's all fairly straightforward, and we have full control over the whole process up to and including the call to PDDocument.save(OutputStream). We are manually constructing our pages as well, so I suppose we could simply limit the number of pages, but I'm more concerned about the size of the memory used and not the number of pages. Is there anything in PDFBox that can help us with this? We can always count e.g. the number of bytes/characters we have written to the PDF, but that seems less important than what is going on inside of the PDF structure itself. - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBCAAGBQJZgMLBAAoJEBzwKT+lPKRY7HoP/RMTQOU8jIvuuoiB4A44rC85 mgrUoJ0aAhpgwtuWEupTNHk08fufWmzwFmaxH5SMLpM7FALvy+u3ssn1w2+5OBZA xStk2Ni8dIo7jjl0YSC2wJ+z4FcfSzV40ut90OmrpkIpnPKM0ICYSERfxhiz1qzN Fhwtty2r+6o/OpyxPAVcLotWIKsOaXPxNg+LGh2WuOko58eXAABgHnMOw5w0ptZp n0DKDo36J0Y8towyQgGjUAIYq8a/8Lf1UVYsQI3qqZoo3B0N3TmSxy+wwNBJ9dyH l599aaBOuUh4BFg0JoAjkEge8Qobl7UrYA88mf6mcOeEKkUy5cald0WKZLkJChl3 Vwybn8brtMJeXTTKpLQWgCQCDLmGUK181fGvofCqHaxdGkBYjqp4NjuqNXrG8adi FpMsxhLk/gpYyJ7i1a3ta+PqU3rw103fEEP5YNQGZx64/Ec4sqzC7QJ0E08hqpau /Ye9goyJ0a+620HS7GeYYCzN+bVVkdG2FUHWcRWG1hSF8+PLi1Y4atWDyWhG1qc7 l3KU6NfM5UE+jS/lElUQxyTF9GuGdIhYfkuC5nl6tF2FI6drRogQ95mkEwFGWbw9 ijssUevr43W/Gx5nUUKZueTjsLTDzgjCWk4cymDKAbP0PtOgE6EWxaP2LiFp9znH JOMb727eMhcfuNEO0eMj =odVE -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Re: catch(IOException | COSVisitorException e)
Steve, On 7/14/17 6:31 AM, Steve Carr wrote: > Thanks for the info. > The user uses Java 6 interpreter. > Do you know if the 2.0.* versions of PDFBox will work on Java 6 > platforms?> > If not how can I find code which will run on Java 6 It's worth pointing-out that multi-catch is just syntactic sugar: the .class file is the same as it would have been if the catch blocks had been separate (except that you'd see different source lines in any stack traces). So just because PDFBox 2 requires Java 7 to compile, it doesn't require Java 7 to run. -chris > On 2017-06-26 19:45 (+0100), Tilman Hausherr wrote: >> Use the source code examples from the source code download, not from > >> some third party websites.> >> >> Btw COSVisitorException no longer exists in the 2.0.* versions.> >> >> Tilman> >> >> >> >> >> Am 26.06.2017 um 11:41 schrieb Steve Carr:> >>> import java.io.IOException;> >>> import org.apache.pdfbox.exceptions.COSVisitorException;> >>> import org.apache.pdfbox.pdmodel.PDDocument;> >>> import org.apache.pdfbox.pdmodel.PDPage;> >>> /**> >>> *> >>> * @author Azeem> >>> * @Email az...@radixcode.com> >>> */> >>> When I compile the following code in netbeans I get> >>> Uncompilable source code - package org.apache.pdfbox.exceptions does not >>> exist in relation tocatch(IOException | COSVisitorException e)> >>> I downloaded pdfbox-1.6.0-src.zip> >>> help> >>> steve> >>> public class Main {> >>> public static void main(String[] args) {> >>> System.out.println("Create Simple PDF file with blank Page");> >>> String fileName = "EmptyPdf.pdf"; // name of our file> >>> try{> >>> PDDocument doc = new PDDocument(); // creating instance of pdfDoc> >>> doc.addPage(new PDPage()); // adding page in pdf doc file> >>> doc.save(fileName); // saving as pdf file with name perm> >>> doc.close(); // cleaning memory> >>> System.out.println("your file created in : "+ >>> System.getProperty("user.dir"));> >>> }> >>> catch(IOException | COSVisitorException e){> >>> System.out.println(e.getMessage());> >>> }> >>> }> >>> }> >> >> >> -> >> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org> >> For additional commands, e-mail: users-h...@pdfbox.apache.org> >> >> > > > Sent from my iPad > > - > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org > signature.asc Description: OpenPGP digital signature