Re: https://issues.apache.org/jira/browse/PDFBOX-2523 still present (or variation of it still present)
Hi, I've improved the self repair mechnism of the trunk based on Steves report. @Steve Please give the newest trunk version/SNAPSHOT a try. Does the issue still persist? BR Andreas Lehmkühler Steve Antoch sant...@yuzu.com hat am 17. Februar 2015 um 00:05 geschrieben: Andreas- Thanks for the response. Sorry for sending directly. Yes, it tries to read from offset 112085940, but does not find the xrefstm there, so that's when it goes searching. It seems to be landing in the middle of something else (perhaps an image?) I tried running the preflight command on the file, and this is what it found there. This is in the middle of a whole series of repetitive byte patterns like these, which is interspersed with other sections of content that is also binary only. ?xml version=1.0 encoding=UTF-8 standalone=no? preflight name=file.pdf executionTimeMS2646/executionTimeMS isValid type=false/isValid errors count=1 error count=1 code1.0/code detailsSyntax error, Error: Expected a long type at offset 112085940, instead got '6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ±¯Óz·C#156;3Í}#14;y#11;ó#3;£g#130;?1º·Ó#158;-ó#143;VÏ:ë½NsË#142;¸#31;6lÙ³fÅ#ë#147;#29;#31;¨Î÷å.£=#137;ù}ÕsÞÿ'/details /error /errors /preflight The patterns seem to be: lots of these: 6lÙ³fÍ#155; interspersed between blocks that are similar to this: ±¯Óz·C#156;3Í}#14;y#11;ó#3;£g#130;?1º·Ó#158;-ó#143;VÏ:ë½NsË#142;¸#31;6lÙ³fÅ#ë#147;#29;#31;¨Î÷å.£=#137;ù}ÕsÞÿ' It just so happens that the offset 112085940 falls right in the middle of a big block of those 6lÙ³fÍ#155; repetitive blocks. Not sure if that's any help. Steve From: Andreas Lehmkühler andr...@lehmi.de Sent: Monday, February 16, 2015 3:34 AM To: users@pdfbox.apache.org Subject: Re: https://issues.apache.org/jira/browse/PDFBOX-2523 still present (or variation of it still present) Hi, Steve Antoch sant...@yuzu.com hat am 13. Februar 2015 um 23:34 geschrieben: Hi Tilman and Andreas-- Please don't contact developers directly, use our mailing lists instead. I've put the users list back into the boat... I am working with Krasimir on this issue. Although we asked, we were denied permission to send the document out. :-( The failure is being triggered when we attempt to use the Encrypt() class to password protect the pdf. We end up with the Expected a long type at offset 113884174, instead got 'xref' failure. I have debugged into the PDFBox code and found the offending parts. PdfBox is trying to parse an xref table located at 113884174. The problem we are seeing is that the inside the trailer it finds the /XRefStm label, and its offset value is returned as 112085940 (which is what is given in the file), However, the checkXRefOffset() call made to verify it doesn't find the xref stream there, so it goes searching and ends up returning the closest xref offset it can find, which happens to be that it returns its own offset at 113884174. I believe that there is an error in PdfBox with respect to this fixup logic, even if it had found the 'correct' xref stream. That is because the fixup offset can NEVER work. Every time it fixes up the location, it lands on a section which begins with xref. The next call is to skip the whitespace, but since there is never any there (it's already proven to be 'xref'), it does not advance the input stream. Then, the first call to parse that xrefstm always calls readObjectID(), which always will throw the exception because the bytes are always 'xref'. So, my questions are: 1) Are these docs fixable or are they truly corrupt? Without having a hand on the pdf itself it's hard to give a 100% answer. But I guess there has to be fix, as adobe is able to open that pdf. I'll try to find one, following your description of the pdf 2) Is this xref issue a known issue with PdfBox? I would try to create a document that displays the error but I honesty don't know how to do so (beyond sending the ones that we have that DO display it). Not until now 3) Do you have any idea how these documents end up in this state if they are being edited by tools such as InDesign, Acrobat, etc? Is there something I can do to identify them? There are a lot of more or less corrupt files in the wild. Those are created using different tools. 4) If this is a truly corrupted document, why would Acrobat be able to open these files but pdfBox cannot? Are these streams somehow ignorable? I ask this because I saw this statement on a web page (http://resources.infosecinstitute.com/pdf-file-format-basic-structure/) when I
RE: setting permissions on a new document
Alright. After the exorcism, all is working. I have no idea why it wasn't working before. Thank you, Tilman! -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Friday, February 20, 2015 6:42 PM To: users@pdfbox.apache.org Subject: RE: setting permissions on a new document Thank you, Tilman. Ha! Sorry, I was just giving the minimal code. The actual code was: public static void main(String[] args) throws Exception { File f = new File(C:/temp/testPDF_protected.pdf); PDDocument document = new PDDocument(); PDPage page = new PDPage(); document.addPage(page); PDFont font = HELVETICA_BOLD; PDPageContentStream contentStream = new PDPageContentStream(document, page); contentStream.beginText(); contentStream.setFont( font, 12 ); contentStream.moveTextPositionByAmount( 100, 700 ); contentStream.drawString( Hello World ); contentStream.endText(); contentStream.close(); AccessPermission ap = new AccessPermission(); ap.setReadOnly(); StandardProtectionPolicy spp = new StandardProtectionPolicy(owner, user, ap); document.protect(spp); document.save(f); document.close(); } The error is There was a problem reading this document (57). I can move the AccessPermissions line before and after creating the page, and I get the same error. If I don't create/add a page, I get the same error. If I comment out the AccessPermission - protect lines, Adobe Reader is able to open the file. I generated the document on Windows with PDFBox 2.0 SNAPSHOT and Java 1.8.0_31. For the record, I'm still sure I'm doing something wrong! :) Best, Tim -Original Message- From: Tilman Hausherr [mailto:thaush...@t-online.de] Sent: Friday, February 20, 2015 5:25 PM To: users@pdfbox.apache.org Subject: Re: setting permissions on a new document Hi Tim, add a page to the document. PDPage page = new PDPage(); document.addPage(page); Tilman Am 20.02.2015 um 22:12 schrieb Allison, Timothy B.: All, I'm trying to create a test doc for permission checking over on Tika, when I try the most basic program: public static void main(String[] args) throws Exception { File f = new File(C:/temp/testPDF_protected.pdf); PDDocument document = new PDDocument(); AccessPermission ap = new AccessPermission(); ap.setReadOnly(); StandardProtectionPolicy spp = new StandardProtectionPolicy(owner, user, ap); document.protect(spp); document.save(f); document.close(); } AdobeReader isn't able to open the file. I'm sure that this is user error...what am I doing wrong? Thank you. Best, Tim - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Importing FDF file issue
Hello, I am trying to import a FDF file into a PDF using the PDFBox. But the PDF file comes back with no values in the fields. I am using PDFBox version 1.8.8. From the release notes it appears that this issue was fixed in 1.8.8. Here is the text from release notes. [PDFBOX-2413] - Loaded FDF document returns null fields Here is the line of code that I am using to populate the PDF. acroForm.importFDF(fdfdoc); I did a lot of research and troubleshooting, so thought of asking if anybody knows what is the status of this long pending issue in PDFBox. Thanks, Rajeev.
Re: Importing FDF file issue
Hi, would you mind sharing your PDF template and FDF so we can have a look at it to replicate the issue? As the mailing list doesn't support attachments please upload the files to a public location. BR Maruan Am 23.02.2015 um 18:25 schrieb Rajeev Menon rajeevrmen...@gmail.com: Hello, I am trying to import a FDF file into a PDF using the PDFBox. But the PDF file comes back with no values in the fields. I am using PDFBox version 1.8.8. From the release notes it appears that this issue was fixed in 1.8.8. Here is the text from release notes. [PDFBOX-2413] - Loaded FDF document returns null fields Here is the line of code that I am using to populate the PDF. acroForm.importFDF(fdfdoc); I did a lot of research and troubleshooting, so thought of asking if anybody knows what is the status of this long pending issue in PDFBox. Thanks, Rajeev.
Re: Importing FDF file issue
Does that mean, it is supposed to work? If that is the case, let me try to use a simple PDF with just one field. Thanks. On Mon, Feb 23, 2015 at 1:11 PM, Maruan Sahyoun sahy...@fileaffairs.de wrote: Hi, would you mind sharing your PDF template and FDF so we can have a look at it to replicate the issue? As the mailing list doesn't support attachments please upload the files to a public location. BR Maruan Am 23.02.2015 um 18:25 schrieb Rajeev Menon rajeevrmen...@gmail.com: Hello, I am trying to import a FDF file into a PDF using the PDFBox. But the PDF file comes back with no values in the fields. I am using PDFBox version 1.8.8. From the release notes it appears that this issue was fixed in 1.8.8. Here is the text from release notes. [PDFBOX-2413] - Loaded FDF document returns null fields Here is the line of code that I am using to populate the PDF. acroForm.importFDF(fdfdoc); I did a lot of research and troubleshooting, so thought of asking if anybody knows what is the status of this long pending issue in PDFBox. Thanks, Rajeev.
Re: https://issues.apache.org/jira/browse/PDFBOX-2523 still present (or variation of it still present)
@Andreas- I have downloaded the latest trunk and came close (it got much further) before failing. However, I think I may have a fix for that failure: The code is returning 0 when the xrefstm fixedOffset is not found. However, the code still tries to load and parse from xref 0, resulting in a null reference exception later in parser.parse(). However, thinking about this, I came up with this: // check for a XRef stream, it may contain some object ids of compressed objects if(trailer.containsKey(COSName.XREF_STM)) { int streamOffset = trailer.getInt(COSName.XREF_STM); // check the xref stream reference fixedOffset = checkXRefStreamOffset(streamOffset, false); //== fixedoffset comes back as 0 = not found if (fixedOffset -1 fixedOffset != streamOffset) { streamOffset = (int)fixedOffset; // == streamOffset gets set to 0 here trailer.setInt(COSName.XREF_STM, streamOffset); } if (streamOffset 0)// I added this test because an xref stream starting at // offset 0 can never happen, so we should simply skip it { pdfSource.seek(streamOffset); skipSpaces(); parseXrefObjStream(prev, false); == this call ultimately throws a null ref exception if streamOffset == 0 on entry } } Adding that, the file successfully parses. Also, there was this proposal that I put up on github in a repo that I directly forked from pdfbox (it is the only change) It relaxes the looping a bit to allow limited recursion. I would appreciate your thoughts on it. https://github.com/santoch/pdfbox/commit/75cc32ab8307062709c30f1cfea5e2fdb8c00ddd Thank you so much! You have been tremendously helpful. I wish I could have given you the files, but unfortunately, they are proprietary and we cannot release them. :-( Best regards- Steve From: Andreas Lehmkühler andr...@lehmi.de Sent: Monday, February 23, 2015 3:43 AM To: users@pdfbox.apache.org Subject: Re: https://issues.apache.org/jira/browse/PDFBOX-2523 still present (or variation of it still present) Hi, I've improved the self repair mechnism of the trunk based on Steves report. @Steve Please give the newest trunk version/SNAPSHOT a try. Does the issue still persist? BR Andreas Lehmkühler Steve Antoch sant...@yuzu.com hat am 17. Februar 2015 um 00:05 geschrieben: Andreas- Thanks for the response. Sorry for sending directly. Yes, it tries to read from offset 112085940, but does not find the xrefstm there, so that's when it goes searching. It seems to be landing in the middle of something else (perhaps an image?) I tried running the preflight command on the file, and this is what it found there. This is in the middle of a whole series of repetitive byte patterns like these, which is interspersed with other sections of content that is also binary only. ?xml version=1.0 encoding=UTF-8 standalone=no? preflight name=file.pdf executionTimeMS2646/executionTimeMS isValid type=false/isValid errors count=1 error count=1 code1.0/code detailsSyntax error, Error: Expected a long type at offset 112085940, instead got '6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ³fÍ#155;6lÙ±¯Óz·C#156;3Í}#14;y#11;ó#3;£g#130;?1º·Ó#158;-ó#143;VÏ:ë½NsË#142;¸#31;6lÙ³fÅ#ë#147;#29;#31;¨Î÷å.£=#137;ù}ÕsÞÿ'/details /error /errors /preflight The patterns seem to be: lots of these: 6lÙ³fÍ#155; interspersed between blocks that are similar to this: ±¯Óz·C#156;3Í}#14;y#11;ó#3;£g#130;?1º·Ó#158;-ó#143;VÏ:ë½NsË#142;¸#31;6lÙ³fÅ#ë#147;#29;#31;¨Î÷å.£=#137;ù}ÕsÞÿ' It just so happens that the offset 112085940 falls right in the middle of a big block of those 6lÙ³fÍ#155; repetitive blocks. Not sure if that's any help. Steve From: Andreas Lehmkühler andr...@lehmi.de Sent: Monday, February 16, 2015 3:34 AM To: users@pdfbox.apache.org Subject: Re: https://issues.apache.org/jira/browse/PDFBOX-2523 still present (or variation of it still present) Hi, Steve Antoch sant...@yuzu.com hat am 13. Februar 2015 um 23:34 geschrieben: Hi Tilman and Andreas-- Please don't contact developers directly, use our mailing lists instead. I've put the users list back into the boat... I am working with Krasimir on this issue. Although we asked,
Re: Importing FDF file issue
honestly - I don't know as I'm not using FDF. But I'm doing a lot with forms and PDFBox so I can look into that. A test case would be great. BR Maruan Am 23.02.2015 um 19:29 schrieb Rajeev Menon rajeevrmen...@gmail.com: Does that mean, it is supposed to work? If that is the case, let me try to use a simple PDF with just one field. Thanks. On Mon, Feb 23, 2015 at 1:11 PM, Maruan Sahyoun sahy...@fileaffairs.de wrote: Hi, would you mind sharing your PDF template and FDF so we can have a look at it to replicate the issue? As the mailing list doesn't support attachments please upload the files to a public location. BR Maruan Am 23.02.2015 um 18:25 schrieb Rajeev Menon rajeevrmen...@gmail.com: Hello, I am trying to import a FDF file into a PDF using the PDFBox. But the PDF file comes back with no values in the fields. I am using PDFBox version 1.8.8. From the release notes it appears that this issue was fixed in 1.8.8. Here is the text from release notes. [PDFBOX-2413] - Loaded FDF document returns null fields Here is the line of code that I am using to populate the PDF. acroForm.importFDF(fdfdoc); I did a lot of research and troubleshooting, so thought of asking if anybody knows what is the status of this long pending issue in PDFBox. Thanks, Rajeev.