Another PDF Merging Error
Trying to merge this file with itself: https://www.dropbox.com/s/6dd0wk5ri80sn2v/claire%20dempster.pdf?dl=0 https://www.dropbox.com/s/6dd0wk5ri80sn2v/claire%20dempster.pdf?dl=0 I issue this command and receive the error below: ~: java -jar /Users/marlon/Desktop/pdfbox-app-1.8.8.jar PDFMerger /Users/marlon/Dropbox/Taras-Marlon/bpm/problem\ PDF\ files/claire\ dempster.pdf /Users/marlon/Dropbox/Taras-Marlon/bpm/problem\ PDF\ files/claire\ dempster.pdf /Users/marlon/Desktop/untitled\ folder/test.pdf Jan 21, 2015 4:38:49 PM org.apache.pdfbox.pdfparser.PDFParser parseObject WARNING: expected='%%EOF' actual='' Jan 21, 2015 4:38:49 PM org.apache.pdfbox.pdfparser.XrefTrailerResolver setStartxref WARNING: Did not found XRef object at specified startxref position 0 Jan 21, 2015 4:38:49 PM org.apache.pdfbox.pdfparser.PDFParser parseObject WARNING: expected='%%EOF' actual='' Jan 21, 2015 4:38:49 PM org.apache.pdfbox.pdfparser.XrefTrailerResolver setStartxref WARNING: Did not found XRef object at specified startxref position 0 Thanks, Marc
Re: Not merging all pages
Thanks Tilman…awesome! Thanks, Marc On Jan 21, 2015, at 5:14 PM, Tilman Hausherr thaush...@t-online.de wrote: Use the nonSeq parser option, then you get 53 :-) Tilman Am 21.01.2015 um 23:00 schrieb Marc Davis: Yet another merging problem. After merging this 53 page PDF file with itself using PDFBox 1.8.8, I only get 16 pages total (8 pages recognized by PDFBox): https://www.dropbox.com/s/s6pybruhrm3bvki/helen%20shih.pdf?dl=0 Thanks, Marc
Not merging all pages
Yet another merging problem. After merging this 53 page PDF file with itself using PDFBox 1.8.8, I only get 16 pages total (8 pages recognized by PDFBox): https://www.dropbox.com/s/s6pybruhrm3bvki/helen%20shih.pdf?dl=0 Thanks, Marc
Re: PDFBox-generated PDF/A-1b not validated using PDFBox validation
I didn't test the file from your code, only the one from the example, both with preflight and with http://www.pdf-tools.com/pdf/validate-pdfa-online.aspx If you want to be really sure, when you continue with development, check your files with the 2.0 version of the preflight app command line: https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/preflight-app/2.0.0-SNAPSHOT/ because some of the improvements in preflight haven't been made to 1.8. Tilman Am 21.01.2015 um 21:39 schrieb Tilman Hausherr: Am 21.01.2015 um 12:31 schrieb Julien Béti: serializer.serialize(xmp, baos, false); change false to true, it should work then. Tilman
Re: Another PDF Merging Error
The file is broken... pdf.js doesn't display it at all. The end of the file has this: trailer /Size 426/Root 1 0 R xref 0 0 trailer /Size 426/Prev 374525/XRefStm 4291/Root 1 0 R/Info 3 0 R/ID[2871C4EA28E5E14CA4CC8054ABC9A97364AEDCA53786314891752768DFE1ADC1] star And yes, it has all the problems reported. And PDFBox does merge the files. Tilman Am 21.01.2015 um 22:49 schrieb Marc Davis: Trying to merge this file with itself: https://www.dropbox.com/s/6dd0wk5ri80sn2v/claire%20dempster.pdf?dl=0 https://www.dropbox.com/s/6dd0wk5ri80sn2v/claire%20dempster.pdf?dl=0 I issue this command and receive the error below: ~: java -jar /Users/marlon/Desktop/pdfbox-app-1.8.8.jar PDFMerger /Users/marlon/Dropbox/Taras-Marlon/bpm/problem\ PDF\ files/claire\ dempster.pdf /Users/marlon/Dropbox/Taras-Marlon/bpm/problem\ PDF\ files/claire\ dempster.pdf /Users/marlon/Desktop/untitled\ folder/test.pdf Jan 21, 2015 4:38:49 PM org.apache.pdfbox.pdfparser.PDFParser parseObject WARNING: expected='%%EOF' actual='' Jan 21, 2015 4:38:49 PM org.apache.pdfbox.pdfparser.XrefTrailerResolver setStartxref WARNING: Did not found XRef object at specified startxref position 0 Jan 21, 2015 4:38:49 PM org.apache.pdfbox.pdfparser.PDFParser parseObject WARNING: expected='%%EOF' actual='' Jan 21, 2015 4:38:49 PM org.apache.pdfbox.pdfparser.XrefTrailerResolver setStartxref WARNING: Did not found XRef object at specified startxref position 0 Thanks, Marc
Re: Not merging all pages
Use the nonSeq parser option, then you get 53 :-) Tilman Am 21.01.2015 um 23:00 schrieb Marc Davis: Yet another merging problem. After merging this 53 page PDF file with itself using PDFBox 1.8.8, I only get 16 pages total (8 pages recognized by PDFBox): https://www.dropbox.com/s/s6pybruhrm3bvki/helen%20shih.pdf?dl=0 Thanks, Marc
Re: merging docs with form fields and tags
I really want to be dropped from the pdfbox email list. Ray Morris ray.morris.brisb...@bigpond.com -Original Message- From: David Hill Sent: Thursday, January 22, 2015 5:32 AM To: users@pdfbox.apache.org Subject: merging docs with form fields and tags We are trying to merge multiple PDFs together and we are having issues with form fields and tags vanishing. Tags and editable form fields will not exist in the output. This seems related to issues 1031 and 930. The problem seems to happen if the first document does not have fields or tags. In the case where the first document does not have form fields the form fields for the second document will be missing but all subsequent form fields seem to exist properly in the resulting document. In the case where the first document does not have tags, all tags on all pages are missing in the resulting document. This is a high priority for our current 508 compliance project, we may attempt to correct the issue ourselves. Does anyone have any ideas what the problem might be and where we might start looking for a solution? Thanks! Dave Hill Lead Developer Iowa Student Loan Liquidity Corp This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the originator of the message. This footer also confirms that this e-mail message has been scanned for the presence of computer viruses. Any views expressed in this message are those of the individual sender, except where the sender specifies and with authority, states them to be the views of Iowa Student Loan.
Re: PDFBox-generated PDF/A-1b not validated using PDFBox validation
Am 21.01.2015 um 12:31 schrieb Julien Béti: serializer.serialize(xmp, baos, false); change false to true, it should work then. Tilman
Re: Runtime exception in PDFParser.parse()
You don’t have to do anything to log runtime exceptions in Java, they are passed up the stack until the process terminates and prints a stack trace. Trying to catch runtime exceptions yourself isn’t necessary or advised. Remove your exception handling code and have main() throw IOException. -- John On 21 Jan 2015, at 06:24, Chris Bamford cbamf...@mimecast.com wrote: Hi Recently I started getting runtime exceptions in PDFParser.parse(), but I can't seem to get any information out as to why. The weird thing is that it only happens on Linux - on my laptop it is fine! I have tried to get a stack trace (my code uses log4j), but all I get is: org.slf4j.impl.Log4jLoggerAdapter.log(Lorg/slf4j/Marker;Ljava/lang/String;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V This is produced by my code: try { log.info(Parsing PDF document ...); document = PDDocument.load(is, true); // skip corrupt pdf objects } catch (Throwable t) { message = Failure when parsing PDF + filename + : + t.getMessage(); if (t1.getCause() != null) { message += | CAUSE: + t1.getCause().getMessage(); } log.warn(t); log.debug(message); throw new Exception(message); } I have also tried to turn logging up in my log4j.properties: log4j.logger.org.apache.pdfbox=TRACE But I see nothing except the slf4j line above. Can anyone tell me what I am doing wrong? Thanks - Chris https://serviceb.mimecast.com/mimecast/click?account=C1A1code=656191af743569eae379793cda49427f Chris Bamford m: +44 7860 405292 w: www.mimecast.com http://www.mimecast.com/ Senior Developer p: +44 207 847 8700 Address click here http://www.mimecast.com/About-us/Contact-us/ https://serviceb.mimecast.com/mimecast/click?account=C1A1code=c01462327e2d761c3b717eed2058e1ac https://serviceb.mimecast.com/mimecast/click?account=C1A1code=6540a5ba8d3a7ff2665c11f28f8051e6 https://serviceb.mimecast.com/mimecast/click?account=C1A1code=09b8c0f1a6d966cb67b03d293176fc22 https://serviceb.mimecast.com/mimecast/click?account=C1A1code=f1e6756b7727ee7b1d3b5c55d10a456d https://serviceb.mimecast.com/mimecast/click?account=C1A1code=4d6b5c1310552b6f6f8dc52c397041ba https://serviceb.mimecast.com/mimecast/click?account=C1A1code=7fb112818d2f83f403ac7d7380fc47c9 Disclaimer The information contained in this communication from cbamf...@mimecast.com sent at 2015-01-21 14:24:10 is confidential and may be legally privileged. It is intended solely for use by users@pdfbox.apache.org and others authorized to receive it. If you are not users@pdfbox.apache.org you are hereby notified that any disclosure, copying, distribution or taking action in reliance of the contents of this information is strictly prohibited and may be unlawful. Mimecast Ltd. is a company registered in England and Wales with the company number 4698693 VAT No. GB 123 4197 34 Registered Office: CityPoint, One Ropemaker Street, Moorgate, London, EC2Y 9AW This email message has been scanned for viruses by Mimecast. Mimecast delivers a complete managed email solution from a single web based platform. For more information please visit http://www.mimecast.com http://www.mimecast.com/
Re: merging docs with form fields and tags
You can unsubscribe at http://pdfbox.apache.org/mailinglists.html http://pdfbox.apache.org/mailinglists.html -- John On 21 Jan 2015, at 12:15, Ray Morris ray.morris.brisb...@bigpond.com wrote: I really want to be dropped from the pdfbox email list. Ray Morris ray.morris.brisb...@bigpond.com -Original Message- From: David Hill Sent: Thursday, January 22, 2015 5:32 AM To: users@pdfbox.apache.org Subject: merging docs with form fields and tags We are trying to merge multiple PDFs together and we are having issues with form fields and tags vanishing. Tags and editable form fields will not exist in the output. This seems related to issues 1031 and 930. The problem seems to happen if the first document does not have fields or tags. In the case where the first document does not have form fields the form fields for the second document will be missing but all subsequent form fields seem to exist properly in the resulting document. In the case where the first document does not have tags, all tags on all pages are missing in the resulting document. This is a high priority for our current 508 compliance project, we may attempt to correct the issue ourselves. Does anyone have any ideas what the problem might be and where we might start looking for a solution? Thanks! Dave Hill Lead Developer Iowa Student Loan Liquidity Corp This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the originator of the message. This footer also confirms that this e-mail message has been scanned for the presence of computer viruses. Any views expressed in this message are those of the individual sender, except where the sender specifies and with authority, states them to be the views of Iowa Student Loan.
Re: Error on PDDocument.load
Hi Kevin works for me - what's your Java Version? BR Maruan Am 21.01.2015 um 11:24 schrieb Kevin Morin mo...@codelutin.com: Hi, it does not work with PDFToImage either, I still get a blank image. Plus, I did not set the nonSeq option however it seems to be using the non sequential parser. And I have the following traces: janv. 21, 2015 11:20:02 AM org.apache.pdfbox.pdfparser.NonSequentialPDFParser ch eckXrefOffsets GRAVE: Can't find the object 7 0 (origin offset 359138) janv. 21, 2015 11:20:03 AM org.apache.pdfbox.contentstream.PDFStreamEngine opera torException GRAVE: Missing XObject: Im1 BR Kevin On 21/01/2015 11:11, Maruan Sahyoun wrote: Hi Kevin, you can test with the PDFToImage command [1] available in from the pdfbox-app [2] if the issue happens there. The source for PDFToImage is available in the tools section of the SVN repo or online viewable [3]. BR Maruan [1] https://pdfbox.apache.org/1.8/commandline.html#pdfToImage [2] https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.0-SNAPSHOT/ [3] http://svn.apache.org/viewvc/pdfbox/trunk/tools/src/main/java/org/apache/pdfbox/tools/PDFToImage.java?view=markup Am 21.01.2015 um 11:00 schrieb Kevin Morin mo...@codelutin.com: Hi Andreas, I am using the latest snapshot available on the maven repository. And I am running my app on Windows Server 2008 R2 Standard and it does not work (white page). Could send me the code or a jar to test on this server to check if it does not come from my code? BR Kevin On 19/01/2015 19:13, Andreas Lehmkuehler wrote: Hi, Am 19.01.2015 um 12:45 schrieb Kevin Morin: Actually, the issue is not only these traces. The real issue is that I have a blank image when I try to render the document. I've checked your PDF and everything renders fine. I've tried SNAPSHOT-891 on linux (running java 1.8, 1.7 and 1.6) and the latest SNAPSHOT-947 on win7 running java 1.7 Maybe your SNAPSHOT is outdated? BR Andreas Lehmkühler On 19/01/2015 12:39, Kevin Morin wrote: Hi, I am using the 2.0 snapshot version to images of pdfs, but on some documents, I have the following error when I call PDDocument.load(file): 2015/01/19 12:32:48 ERROR (org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1864) - Can't find the object 7 0 (origin offset 359138) 2015/01/19 12:32:48 ERROR (org.apache.pdfbox.contentstream.PDFStreamEngine:840) - Missing XObject: Im1 I first had it a few days ago (I did not report it, shame on me) but the error did not occur when I called the loadLegacy method on PDDocument. But the loadLegacy method is not available anymore... The issue happens on Windows (works fine on Debian). Thanks fo your help Kevin
Re: Error on PDDocument.load
Hi Kevin, you can test with the PDFToImage command [1] available in from the pdfbox-app [2] if the issue happens there. The source for PDFToImage is available in the tools section of the SVN repo or online viewable [3]. BR Maruan [1] https://pdfbox.apache.org/1.8/commandline.html#pdfToImage [2] https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.0-SNAPSHOT/ [3] http://svn.apache.org/viewvc/pdfbox/trunk/tools/src/main/java/org/apache/pdfbox/tools/PDFToImage.java?view=markup Am 21.01.2015 um 11:00 schrieb Kevin Morin mo...@codelutin.com: Hi Andreas, I am using the latest snapshot available on the maven repository. And I am running my app on Windows Server 2008 R2 Standard and it does not work (white page). Could send me the code or a jar to test on this server to check if it does not come from my code? BR Kevin On 19/01/2015 19:13, Andreas Lehmkuehler wrote: Hi, Am 19.01.2015 um 12:45 schrieb Kevin Morin: Actually, the issue is not only these traces. The real issue is that I have a blank image when I try to render the document. I've checked your PDF and everything renders fine. I've tried SNAPSHOT-891 on linux (running java 1.8, 1.7 and 1.6) and the latest SNAPSHOT-947 on win7 running java 1.7 Maybe your SNAPSHOT is outdated? BR Andreas Lehmkühler On 19/01/2015 12:39, Kevin Morin wrote: Hi, I am using the 2.0 snapshot version to images of pdfs, but on some documents, I have the following error when I call PDDocument.load(file): 2015/01/19 12:32:48 ERROR (org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1864) - Can't find the object 7 0 (origin offset 359138) 2015/01/19 12:32:48 ERROR (org.apache.pdfbox.contentstream.PDFStreamEngine:840) - Missing XObject: Im1 I first had it a few days ago (I did not report it, shame on me) but the error did not occur when I called the loadLegacy method on PDDocument. But the loadLegacy method is not available anymore... The issue happens on Windows (works fine on Debian). Thanks fo your help Kevin
Re: Error on PDDocument.load
Hi, it does not work with PDFToImage either, I still get a blank image. Plus, I did not set the nonSeq option however it seems to be using the non sequential parser. And I have the following traces: janv. 21, 2015 11:20:02 AM org.apache.pdfbox.pdfparser.NonSequentialPDFParser ch eckXrefOffsets GRAVE: Can't find the object 7 0 (origin offset 359138) janv. 21, 2015 11:20:03 AM org.apache.pdfbox.contentstream.PDFStreamEngine opera torException GRAVE: Missing XObject: Im1 BR Kevin On 21/01/2015 11:11, Maruan Sahyoun wrote: Hi Kevin, you can test with the PDFToImage command [1] available in from the pdfbox-app [2] if the issue happens there. The source for PDFToImage is available in the tools section of the SVN repo or online viewable [3]. BR Maruan [1] https://pdfbox.apache.org/1.8/commandline.html#pdfToImage [2] https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.0-SNAPSHOT/ [3] http://svn.apache.org/viewvc/pdfbox/trunk/tools/src/main/java/org/apache/pdfbox/tools/PDFToImage.java?view=markup Am 21.01.2015 um 11:00 schrieb Kevin Morin mo...@codelutin.com: Hi Andreas, I am using the latest snapshot available on the maven repository. And I am running my app on Windows Server 2008 R2 Standard and it does not work (white page). Could send me the code or a jar to test on this server to check if it does not come from my code? BR Kevin On 19/01/2015 19:13, Andreas Lehmkuehler wrote: Hi, Am 19.01.2015 um 12:45 schrieb Kevin Morin: Actually, the issue is not only these traces. The real issue is that I have a blank image when I try to render the document. I've checked your PDF and everything renders fine. I've tried SNAPSHOT-891 on linux (running java 1.8, 1.7 and 1.6) and the latest SNAPSHOT-947 on win7 running java 1.7 Maybe your SNAPSHOT is outdated? BR Andreas Lehmkühler On 19/01/2015 12:39, Kevin Morin wrote: Hi, I am using the 2.0 snapshot version to images of pdfs, but on some documents, I have the following error when I call PDDocument.load(file): 2015/01/19 12:32:48 ERROR (org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1864) - Can't find the object 7 0 (origin offset 359138) 2015/01/19 12:32:48 ERROR (org.apache.pdfbox.contentstream.PDFStreamEngine:840) - Missing XObject: Im1 I first had it a few days ago (I did not report it, shame on me) but the error did not occur when I called the loadLegacy method on PDDocument. But the loadLegacy method is not available anymore... The issue happens on Windows (works fine on Debian). Thanks fo your help Kevin
Re: Error on PDDocument.load
I thought I was running java 7 but it's java 8... I tried with java 7 and it works. I do not need it to work with java 8, java 7 is ok for me. Thanks for your help and for all your work. Kevin On 21/01/2015 11:54, Maruan Sahyoun wrote: Hi Kevin works for me - what's your Java Version? BR Maruan Am 21.01.2015 um 11:24 schrieb Kevin Morin mo...@codelutin.com: Hi, it does not work with PDFToImage either, I still get a blank image. Plus, I did not set the nonSeq option however it seems to be using the non sequential parser. And I have the following traces: janv. 21, 2015 11:20:02 AM org.apache.pdfbox.pdfparser.NonSequentialPDFParser ch eckXrefOffsets GRAVE: Can't find the object 7 0 (origin offset 359138) janv. 21, 2015 11:20:03 AM org.apache.pdfbox.contentstream.PDFStreamEngine opera torException GRAVE: Missing XObject: Im1 BR Kevin On 21/01/2015 11:11, Maruan Sahyoun wrote: Hi Kevin, you can test with the PDFToImage command [1] available in from the pdfbox-app [2] if the issue happens there. The source for PDFToImage is available in the tools section of the SVN repo or online viewable [3]. BR Maruan [1] https://pdfbox.apache.org/1.8/commandline.html#pdfToImage [2] https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.0-SNAPSHOT/ [3] http://svn.apache.org/viewvc/pdfbox/trunk/tools/src/main/java/org/apache/pdfbox/tools/PDFToImage.java?view=markup Am 21.01.2015 um 11:00 schrieb Kevin Morin mo...@codelutin.com: Hi Andreas, I am using the latest snapshot available on the maven repository. And I am running my app on Windows Server 2008 R2 Standard and it does not work (white page). Could send me the code or a jar to test on this server to check if it does not come from my code? BR Kevin On 19/01/2015 19:13, Andreas Lehmkuehler wrote: Hi, Am 19.01.2015 um 12:45 schrieb Kevin Morin: Actually, the issue is not only these traces. The real issue is that I have a blank image when I try to render the document. I've checked your PDF and everything renders fine. I've tried SNAPSHOT-891 on linux (running java 1.8, 1.7 and 1.6) and the latest SNAPSHOT-947 on win7 running java 1.7 Maybe your SNAPSHOT is outdated? BR Andreas Lehmkühler On 19/01/2015 12:39, Kevin Morin wrote: Hi, I am using the 2.0 snapshot version to images of pdfs, but on some documents, I have the following error when I call PDDocument.load(file): 2015/01/19 12:32:48 ERROR (org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1864) - Can't find the object 7 0 (origin offset 359138) 2015/01/19 12:32:48 ERROR (org.apache.pdfbox.contentstream.PDFStreamEngine:840) - Missing XObject: Im1 I first had it a few days ago (I did not report it, shame on me) but the error did not occur when I called the loadLegacy method on PDDocument. But the loadLegacy method is not available anymore... The issue happens on Windows (works fine on Debian). Thanks fo your help Kevin
Re: Error on PDDocument.load
Hi, Kevin Morin mo...@codelutin.com hat am 21. Januar 2015 um 12:14 geschrieben: I thought I was running java 7 but it's java 8... I tried with java 7 and it works. I do not need it to work with java 8, java 7 is ok for me. It works for me using java 8 on win7 and linux as well. I guess, the issue has to be something else BR Andreas Lehmkühler Thanks for your help and for all your work. Kevin On 21/01/2015 11:54, Maruan Sahyoun wrote: Hi Kevin works for me - what's your Java Version? BR Maruan Am 21.01.2015 um 11:24 schrieb Kevin Morin mo...@codelutin.com: Hi, it does not work with PDFToImage either, I still get a blank image. Plus, I did not set the nonSeq option however it seems to be using the non sequential parser. And I have the following traces: janv. 21, 2015 11:20:02 AM org.apache.pdfbox.pdfparser.NonSequentialPDFParser ch eckXrefOffsets GRAVE: Can't find the object 7 0 (origin offset 359138) janv. 21, 2015 11:20:03 AM org.apache.pdfbox.contentstream.PDFStreamEngine opera torException GRAVE: Missing XObject: Im1 BR Kevin On 21/01/2015 11:11, Maruan Sahyoun wrote: Hi Kevin, you can test with the PDFToImage command [1] available in from the pdfbox-app [2] if the issue happens there. The source for PDFToImage is available in the tools section of the SVN repo or online viewable [3]. BR Maruan [1] https://pdfbox.apache.org/1.8/commandline.html#pdfToImage [2] https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.0-SNAPSHOT/ [3] http://svn.apache.org/viewvc/pdfbox/trunk/tools/src/main/java/org/apache/pdfbox/tools/PDFToImage.java?view=markup Am 21.01.2015 um 11:00 schrieb Kevin Morin mo...@codelutin.com: Hi Andreas, I am using the latest snapshot available on the maven repository. And I am running my app on Windows Server 2008 R2 Standard and it does not work (white page). Could send me the code or a jar to test on this server to check if it does not come from my code? BR Kevin On 19/01/2015 19:13, Andreas Lehmkuehler wrote: Hi, Am 19.01.2015 um 12:45 schrieb Kevin Morin: Actually, the issue is not only these traces. The real issue is that I have a blank image when I try to render the document. I've checked your PDF and everything renders fine. I've tried SNAPSHOT-891 on linux (running java 1.8, 1.7 and 1.6) and the latest SNAPSHOT-947 on win7 running java 1.7 Maybe your SNAPSHOT is outdated? BR Andreas Lehmkühler On 19/01/2015 12:39, Kevin Morin wrote: Hi, I am using the 2.0 snapshot version to images of pdfs, but on some documents, I have the following error when I call PDDocument.load(file): 2015/01/19 12:32:48 ERROR (org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1864) - Can't find the object 7 0 (origin offset 359138) 2015/01/19 12:32:48 ERROR (org.apache.pdfbox.contentstream.PDFStreamEngine:840) - Missing XObject: Im1 I first had it a few days ago (I did not report it, shame on me) but the error did not occur when I called the loadLegacy method on PDDocument. But the loadLegacy method is not available anymore... The issue happens on Windows (works fine on Debian). Thanks fo your help Kevin
PDFBox-generated PDF/A-1b not validated using PDFBox validation
Hello, Attached, you'll find a sample java code which creates a PDF/A-1b file using CreatePDFA sample [1], and then immediately tries to validate it using Preflight as described in Cookbook [2] The validation fails with the following error: --8 The file/tmp/test.pdf is not valid, error(s) : 7.1 : Error on MetaData, xmp should start with a processing instruction --8 I'm using PDFBox to generates PDF/A files which are validated by an application which obviously uses PDFBox for validation, and my PDF files are rejected for the moment, with the same error. Could you please tell me what's missing in CreatePDFA sample to make it pass the validation? Kind Regards, Julien. [1] http://svn.apache.org/viewvc/pdfbox/branches/1.8/examples/src/main/java/org/apache/pdfbox/examples/pdfa/CreatePDFA.java?revision=1620380view=markup [2] http://pdfbox.apache.org/1.8/cookbook/pdfavalidation.html /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the License); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an AS IS BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import java.io.ByteArrayOutputStream; import java.io.InputStream; import javax.activation.FileDataSource; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.pdmodel.PDDocumentCatalog; import org.apache.pdfbox.pdmodel.PDPage; import org.apache.pdfbox.pdmodel.common.PDMetadata; import org.apache.pdfbox.pdmodel.edit.PDPageContentStream; import org.apache.pdfbox.pdmodel.font.PDFont; import org.apache.pdfbox.pdmodel.font.PDTrueTypeFont; import org.apache.pdfbox.pdmodel.graphics.color.PDOutputIntent; import org.apache.pdfbox.preflight.PreflightDocument; import org.apache.pdfbox.preflight.ValidationResult; import org.apache.pdfbox.preflight.exception.SyntaxValidationException; import org.apache.pdfbox.preflight.parser.PreflightParser; import org.apache.xmpbox.XMPMetadata; import org.apache.xmpbox.schema.PDFAIdentificationSchema; import org.apache.xmpbox.type.BadFieldValueException; import org.apache.xmpbox.xml.XmpSerializationException; import org.apache.xmpbox.xml.XmpSerializer; /** * This is an example that creates a simple PDF/A document. * */ public class CreateAndCheckPDFA { /** * Constructor. */ public CreateAndCheckPDFA() { super(); } /** * Create a simple PDF/A document. * * This example is based on HelloWorld example. * * As it is a simple case, to conform the PDF/A norm, are added : - the font used in the document - a light xmp * block with only PDF identification schema (the only mandatory) - an output intent * * @param file The file to write the PDF to. * @param message The message to write in the file. * * @throws Exception If something bad occurs */ public void doIt(String file, String message) throws Exception { // the document PDDocument doc = null; try { doc = new PDDocument(); PDPage page = new PDPage(); doc.addPage(page); // load the font from pdfbox.jar InputStream fontStream = CreateAndCheckPDFA.class.getResourceAsStream(/org/apache/pdfbox/resources/ttf/ArialMT.ttf); PDFont font = PDTrueTypeFont.loadTTF(doc, fontStream); // create a page with the message where needed PDPageContentStream contentStream = new PDPageContentStream(doc, page); contentStream.beginText(); contentStream.setFont(font, 12); contentStream.moveTextPositionByAmount(100, 700); contentStream.drawString(message); contentStream.endText(); contentStream.saveGraphicsState(); contentStream.close(); PDDocumentCatalog cat = doc.getDocumentCatalog(); PDMetadata metadata = new PDMetadata(doc); cat.setMetadata(metadata); XMPMetadata xmp = XMPMetadata.createXMPMetadata(); try { PDFAIdentificationSchema pdfaid = xmp.createAndAddPFAIdentificationSchema(); pdfaid.setConformance(B); pdfaid.setPart(1); pdfaid.setAboutAsSimple(PDFBox PDFA sample); XmpSerializer serializer = new XmpSerializer(); ByteArrayOutputStream baos = new ByteArrayOutputStream(); serializer.serialize(xmp, baos, false); metadata.importXMPMetadata(baos.toByteArray()); } catch (BadFieldValueException badFieldexception) { // can't happen here, as the provided value is valid } catch (XmpSerializationException xmpException) { System.err.println(xmpException.getMessage()); }
Re: Error on PDDocument.load
Hi Andreas, I am using the latest snapshot available on the maven repository. And I am running my app on Windows Server 2008 R2 Standard and it does not work (white page). Could send me the code or a jar to test on this server to check if it does not come from my code? BR Kevin On 19/01/2015 19:13, Andreas Lehmkuehler wrote: Hi, Am 19.01.2015 um 12:45 schrieb Kevin Morin: Actually, the issue is not only these traces. The real issue is that I have a blank image when I try to render the document. I've checked your PDF and everything renders fine. I've tried SNAPSHOT-891 on linux (running java 1.8, 1.7 and 1.6) and the latest SNAPSHOT-947 on win7 running java 1.7 Maybe your SNAPSHOT is outdated? BR Andreas Lehmkühler On 19/01/2015 12:39, Kevin Morin wrote: Hi, I am using the 2.0 snapshot version to images of pdfs, but on some documents, I have the following error when I call PDDocument.load(file): 2015/01/19 12:32:48 ERROR (org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1864) - Can't find the object 7 0 (origin offset 359138) 2015/01/19 12:32:48 ERROR (org.apache.pdfbox.contentstream.PDFStreamEngine:840) - Missing XObject: Im1 I first had it a few days ago (I did not report it, shame on me) but the error did not occur when I called the loadLegacy method on PDDocument. But the loadLegacy method is not available anymore... The issue happens on Windows (works fine on Debian). Thanks fo your help Kevin
Runtime exception in PDFParser.parse()
Hi Recently I started getting runtime exceptions in PDFParser.parse(), but I can't seem to get any information out as to why. The weird thing is that it only happens on Linux - on my laptop it is fine! I have tried to get a stack trace (my code uses log4j), but all I get is: org.slf4j.impl.Log4jLoggerAdapter.log(Lorg/slf4j/Marker;Ljava/lang/String;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V This is produced by my code: try { log.info(Parsing PDF document ...); document = PDDocument.load(is, true); // skip corrupt pdf objects } catch (Throwable t) { message = Failure when parsing PDF + filename + : + t.getMessage(); if (t1.getCause() != null) { message += | CAUSE: + t1.getCause().getMessage(); } log.warn(t); log.debug(message); throw new Exception(message); } I have also tried to turn logging up in my log4j.properties: log4j.logger.org.apache.pdfbox=TRACE But I see nothing except the slf4j line above. Can anyone tell me what I am doing wrong? Thanks - Chris Chris Bamford m: +44 7860 405292 www.mimecast.com Mimecast CityPoint One Ropemaker Street, London, EC2Y 9AW +44 (0) 207 847 8700 Disclaimer cbamf...@mimecast.com sent at 2015-01-21 14:24:10 is confidential and may be legally privileged. It is intended solely for use by users@pdfbox.apache.org and others authorized to receive it. If you are not users@pdfbox.apache.org you are hereby notified that any disclosure, copying, distribution or taking action in reliance of the contents of this information is strictly prohibited and may be unlawful. Mimecast Ltd. is a company registered in England and Wales with the company number 4698693 VAT No. GB 123 4197 34 Registered Office: CityPoint, One Ropemaker Street, Moorgate, London, EC2Y 9AW This email message has been scanned for viruses by Mimecast. Mimecast delivers a complete managed email solution from a single web based platform. For more information please visit www.mimecast.com mcst2013