Re: [PDFBOX-2.0] PDF Size after Signature
Hi, Tilman Hausherr thaush...@t-online.de hat am 27. Februar 2015 um 07:35 geschrieben: Did you just start with signing or is this a recent phenomenon, i.e. didn't happen a month ago? I looked at both files - in the 1.8 one, only the changed objects appear after EOF. In the 2.0 one, all objects are there ?! Correct, something went wrong when appending the changed objects only. It work for me when I fixed the encryption stuff. I seems as if some recent change introduced this regression. @Isaias Which exact version/revision of the trunk are you using? BR Andreas Lehmkühler Tilman Am 27.02.2015 um 05:44 schrieb Isaias Barroso: Hi all, I'm using PDFBOX 2.0 to sign some documents and I found that the size of signed file is too big if compared with 1.8 version, sometimes those files get their sizes increased in 100% or more. When the same file is signed using 1.8 the file is increased in a expected way. Original File: https://www.dropbox.com/s/s8p40ukorhchtcu/sign_me.pdf?dl=0 Signed With 1.8: https://www.dropbox.com/s/ty8axylq8ol6204/sign_me_signed_1.8.pdf?dl=0 Signed With 2.0: https://www.dropbox.com/s/ge1x3mdpqlalnvq/sign_me_signed_2.0.pdf?dl=0 There is some option to reduce the signed file size on 2.0? Best regards - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Re: [PDFBOX-2.0] PDF Size after Signature
Did you just start with signing or is this a recent phenomenon, i.e. didn't happen a month ago? I looked at both files - in the 1.8 one, only the changed objects appear after EOF. In the 2.0 one, all objects are there ?! Tilman Am 27.02.2015 um 05:44 schrieb Isaias Barroso: Hi all, I'm using PDFBOX 2.0 to sign some documents and I found that the size of signed file is too big if compared with 1.8 version, sometimes those files get their sizes increased in 100% or more. When the same file is signed using 1.8 the file is increased in a expected way. Original File: https://www.dropbox.com/s/s8p40ukorhchtcu/sign_me.pdf?dl=0 Signed With 1.8: https://www.dropbox.com/s/ty8axylq8ol6204/sign_me_signed_1.8.pdf?dl=0 Signed With 2.0: https://www.dropbox.com/s/ge1x3mdpqlalnvq/sign_me_signed_2.0.pdf?dl=0 There is some option to reduce the signed file size on 2.0? Best regards - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Re: https://issues.apache.org/jira/browse/PDFBOX-2523 still present (or variation of it still present)
Hi, Steve Antoch sant...@yuzu.com hat am 25. Februar 2015 um 00:04 geschrieben: Hi Andreas- Thanks again. I downloaded and built the latest from trunk. There was no change for the book I was testing. I first tried it after taking out my if (streamOffset 0) test, but the null reference exception still occurred. OK, thanks again for testing. I've fixed the issue based on your analysis. We are planning on running a large breadth test on approximately 108,000 pdfs starting tonight. I will let you know how this test goes. It will take about 4 days to complete. Cool, I'm looking forward to see the results. With respect to the small change I made in my fork: https://github.com/santoch/pdfbox/commit/75cc32ab8307062709c30f1cfea5e2fdb8c00ddd The issue was a separate but fairly rare failure that we found in a small number (about 10) of our pdfs. Adobe and Pdfium (Chrome) were both able to open them but pdfBox was not due to disallowing nesting. I figured that if Pdfium allows 64 levels of nesting, we might be able to relax this test from 0 levels to allowing 1 level and see if it worked. Since it did, I wanted to run those changes by you for your comments. Is there any chance to get a hand on a sample pdf? I would be good enough to send it via private mail to me: BR Andreas Lehmkühler Thanks- Steve From: Andreas Lehmkühler andr...@lehmi.de Sent: Tuesday, February 24, 2015 3:30 AM To: users@pdfbox.apache.org Subject: Re: https://issues.apache.org/jira/browse/PDFBOX-2523 still present (or variation of it still present) Hi Steve, Steve Antoch sant...@yuzu.com hat am 23. Februar 2015 um 19:42 geschrieben: @Andreas- I have downloaded the latest trunk and came close (it got much further) before failing. However, I think I may have a fix for that failure: Thanks for the test The code is returning 0 when the xrefstm fixedOffset is not found. However, the code still tries to load and parse from xref 0, resulting in a null reference exception later in parser.parse(). Your analysis is correct, but I hope that my last improvements should eliminate such cases, see PDFBOX-2572 for details. Could you give the latest trunk (r1661747) a try? However, thinking about this, I came up with this: // check for a XRef stream, it may contain some object ids of compressed objects if(trailer.containsKey(COSName.XREF_STM)) { int streamOffset = trailer.getInt(COSName.XREF_STM); // check the xref stream reference fixedOffset = checkXRefStreamOffset(streamOffset, false); //== fixedoffset comes back as 0 = not found if (fixedOffset -1 fixedOffset != streamOffset) { streamOffset = (int)fixedOffset; // == streamOffset gets set to 0 here trailer.setInt(COSName.XREF_STM, streamOffset); } if (streamOffset 0)// I added this test because an xref stream starting at // offset 0 can never happen, so we should simply skip it { pdfSource.seek(streamOffset); skipSpaces(); parseXrefObjStream(prev, false); == this call ultimately throws a null ref exception if streamOffset == 0 on entry } } Adding that, the file successfully parses. Also, there was this proposal that I put up on github in a repo that I directly forked from pdfbox (it is the only change) It relaxes the looping a bit to allow limited recursion. I would appreciate your thoughts on it. Is this change related to the discussed issue above? https://github.com/santoch/pdfbox/commit/75cc32ab8307062709c30f1cfea5e2fdb8c00ddd Thank you so much! You have been tremendously helpful. I wish I could have given you the files, but unfortunately, they are proprietary and we cannot release them. :-( No need to worry, you are not the only one who is not allowed to share a specific pdf. Best regards- Steve BR Andreas Lehmkühler From: Andreas Lehmkühler andr...@lehmi.de Sent: Monday, February 23, 2015 3:43 AM To: users@pdfbox.apache.org Subject: Re: https://issues.apache.org/jira/browse/PDFBOX-2523 still present (or variation of it still present) Hi, I've improved the self repair mechnism of the trunk based on Steves report. @Steve Please give the newest trunk version/SNAPSHOT a try. Does the issue still persist? BR Andreas Lehmkühler Steve Antoch sant...@yuzu.com hat am 17. Februar 2015 um 00:05 geschrieben:
Re: Error on PDFRenderer.renderImage (PDFBox 2.0)
Hi Kevin, A new snapshot should be available within a few hours. It is slightly faster and your file will be processed if you use the JPEG reader of https://github.com/haraldk/TwelveMonkeys . Either add this to your pom file (if you use maven) dependency groupIdcom.twelvemonkeys.imageio/groupId artifactIdimageio-jpeg/artifactId version3.0.2/version /dependency or add six(!) files into your classpath as described on the URL. My understanding is that twelvemonkeys tries to process every broken JPEG file, similar to us trying to process every broken PDF file. Tilman Am 25.02.2015 um 15:04 schrieb Kevin Morin: Hi Tilman, great news! When do you think this will be available in the snapshot? BR Kevin Le 24/02/2015 00:36, Tilman Hausherr a écrit : Some good news: your file can be rendered with twelvemonkeys. See at the bottom of https://issues.apache.org/jira/browse/PDFBOX-2128 Tilman Am 03.02.2015 um 11:34 schrieb Kevin Morin: Hi Tilman, I tried with JPedal and it works... and it also works with java 8. Do you have a clue on how to solve this? BR Kevin On 01/02/2015 16:06, Tilman Hausherr wrote: No good news - I extracted the JPEG file with NOTEPAD++ and get the same error with ImageIO. What does work is JPEGImageDecoder imageDecoder = JPEGCodec.createJPEGDecoder(...); raster = imageDecoder.decodeAsRaster(); but then the colors are black and violet :-( Tilman Am 30.01.2015 um 12:00 schrieb Kevin Morin: Hi, I have the following error when I try to render a PDF file (I cannot send it on a plublic list, but I can send it in private if needed). It happens with PdfBox 2.0 under Linux and Windows, Java 7 but not Java 8. Invalid image format sun.java2d.cmm.kcms.CMM.checkStatus(CMM.java:180) sun.java2d.cmm.kcms.CMM.createTransform(CMM.java:134) java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:540) com.sun.imageio.plugins.jpeg.JPEGImageReader.acceptPixels(JPEGImageReader.java:1263) com.sun.imageio.plugins.jpeg.JPEGImageReader.readImage(Native Method) com.sun.imageio.plugins.jpeg.JPEGImageReader.readInternal(JPEGImageReader.java:1231) com.sun.imageio.plugins.jpeg.JPEGImageReader.read(JPEGImageReader.java:1034) javax.imageio.ImageReader.read(ImageReader.java:940) org.apache.pdfbox.filter.DCTFilter.decode(DCTFilter.java:72) org.apache.pdfbox.cos.COSStream.attemptDecode(COSStream.java:463) org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:417) org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:363) org.apache.pdfbox.cos.COSStream.getDecodeResult(COSStream.java:303) org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.init(PDImageXObject.java:115) org.apache.pdfbox.pdmodel.graphics.PDXObject.createXObject(PDXObject.java:65) org.apache.pdfbox.pdmodel.PDResources.getXObject(PDResources.java:193) org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:42) org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:803) org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:465) org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:439) org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:149) org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:163) org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:204) org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:137) org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:70) Thanks for your help Best Kevin - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Re: font errors when reading PDF (not writing)
John- (sorry to hijack - I think this is related enough that it warrants asking here) If we run pdfbox on a headless server, will the Encrypt() class still function properly? We do not render anything, just encrypt the document. My suspicion is that this is not an issue, though it would be nice to be sure. Thanks- Steve From: John Hewson j...@jahewson.com Sent: Wednesday, February 25, 2015 1:27 PM To: users@pdfbox.apache.org Subject: Re: font errors when reading PDF (not writing) Are you running on a headless system, such as a server? If so, you probably don’t have any fonts installed. Even though you’re just doing text extraction, this matters because the dimensions of the characters need to be taken into account and many PDFs do not embed the fonts which they depend on. At a bare minimum I’d recommend installing the liberation fonts and whichever Microsoft fonts are available in your distribution’s package manager. — John On 25 Feb 2015, at 06:12, Juan M Uys opy...@gmail.com wrote: Hello, I'm extracting text from PDFs using PDFTextStripperByArea and get a lot of these in the log: Feb 25, 2015 2:01:44 PM org.apache.pdfbox.pdmodel.font.ExternalFonts getTrueTypeFallbackFont SEVERE: No TTF fallback font for 'Helvetica' Feb 25, 2015 2:01:44 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont init WARNING: Using fallback font 'LiberationSans' for 'ArialMT' I've searched the documentation for font-related advice, which seems to pertain to WRITING PDFs, whereas I'm merely extracting text. Please let me know how to get around this problem. Do I need to install extra font packages? If so, how? Where from? At the very least, I'd like to know how to remove these statements from my log. (I've tried throwing logback.xml and log4j.properties into my resources folder, setting package org.apache.pdfbox to INFO, to no avail) The system running my extractor code is stock Ubuntu 14.04 with Azul openjdk 7 (see https://registry.hub.docker.com/u/azul/zulu-openjdk/dockerfile/) Thanks, Juan - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Re: font errors when reading PDF (not writing)
Yep, it’ll work fine. If you’re using AES 256 you’ll need the Java unlimited security” files installed with your JVM. — John On 26 Feb 2015, at 12:35, Steve Antoch sant...@yuzu.com wrote: John- (sorry to hijack - I think this is related enough that it warrants asking here) If we run pdfbox on a headless server, will the Encrypt() class still function properly? We do not render anything, just encrypt the document. My suspicion is that this is not an issue, though it would be nice to be sure. Thanks- Steve From: John Hewson j...@jahewson.com Sent: Wednesday, February 25, 2015 1:27 PM To: users@pdfbox.apache.org Subject: Re: font errors when reading PDF (not writing) Are you running on a headless system, such as a server? If so, you probably don’t have any fonts installed. Even though you’re just doing text extraction, this matters because the dimensions of the characters need to be taken into account and many PDFs do not embed the fonts which they depend on. At a bare minimum I’d recommend installing the liberation fonts and whichever Microsoft fonts are available in your distribution’s package manager. — John On 25 Feb 2015, at 06:12, Juan M Uys opy...@gmail.com wrote: Hello, I'm extracting text from PDFs using PDFTextStripperByArea and get a lot of these in the log: Feb 25, 2015 2:01:44 PM org.apache.pdfbox.pdmodel.font.ExternalFonts getTrueTypeFallbackFont SEVERE: No TTF fallback font for 'Helvetica' Feb 25, 2015 2:01:44 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont init WARNING: Using fallback font 'LiberationSans' for 'ArialMT' I've searched the documentation for font-related advice, which seems to pertain to WRITING PDFs, whereas I'm merely extracting text. Please let me know how to get around this problem. Do I need to install extra font packages? If so, how? Where from? At the very least, I'd like to know how to remove these statements from my log. (I've tried throwing logback.xml and log4j.properties into my resources folder, setting package org.apache.pdfbox to INFO, to no avail) The system running my extractor code is stock Ubuntu 14.04 with Azul openjdk 7 (see https://registry.hub.docker.com/u/azul/zulu-openjdk/dockerfile/) Thanks, Juan - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
[PDFBOX-2.0] PDF Size after Signature
Hi all, I'm using PDFBOX 2.0 to sign some documents and I found that the size of signed file is too big if compared with 1.8 version, sometimes those files get their sizes increased in 100% or more. When the same file is signed using 1.8 the file is increased in a expected way. Original File: https://www.dropbox.com/s/s8p40ukorhchtcu/sign_me.pdf?dl=0 Signed With 1.8: https://www.dropbox.com/s/ty8axylq8ol6204/sign_me_signed_1.8.pdf?dl=0 Signed With 2.0: https://www.dropbox.com/s/ge1x3mdpqlalnvq/sign_me_signed_2.0.pdf?dl=0 There is some option to reduce the signed file size on 2.0? Best regards -- Isaías Barroso Belo Horizonte - MG