Hi, > Balaji Venkatamohan <[email protected]> hat am 20. Mai 2015 um 03:24 > geschrieben: > > > Thank you for your pointers and sorry about the image. I am attaching it > with this email. > > The point I am trying to make is that the PDF, which was decompressed using > WriteDecodedDoc, is smaller in size than the original PDF given to us by > our customers. > Also, the decompressed PDF generated by WriterDecodedDoc of PDFBox did not > have any PDAcroform fields whereas the decompressed PDF given to us by the > customers does contain Acroform fields. Hence I wanted to know how to > properly decompress the PDF using pdfbox APIs. The reason why I was > analyzing COSStream was to check if the decompression of the compressed PDF > was happening correctly while using PDFBox APIs. > I know it would have been difficult for you to help me without the actual > PDFs. For that, I would like to thank you for your time and pointers. Maybe it's worth to try to share the file "visually" with us. Open both files (compressed and decompressed) with PDFDebugger [1] and post a screenshot of both somehwere (dropbox etc.) and share the link with us. Maybe that could shed some light on your issue.
BR Andreas Lehmkühler [1] http://pdfbox.apache.org/1.8/commandline.html#pdfDebugger > > On Tue, May 19, 2015 at 2:57 PM, Tilman Hausherr <[email protected]> > wrote: > > > Hi, > > > > The image doesn't appear in the mailing list. > > > > This is all very confusing... /acroform is in the document catalog. I > > don't see how the page content stream is related to it. The best is that > > you either go through the source code, or read the spec and then look at > > the pdf. > > > > To find out what's going on, you'd have to start from that /acroform entry > > and then compare the two files. > > > > It is really difficult to help you without the files. The cause could be a > > bug in pdfbox, or a malformed pdf... > > > > Some more ideas: > > - use loadNonSeq(file, null) instead of load(file) > > - try the unreleased 2.0 version, that one has some improvements in the > > acroform stuff. Note that the API is different. > > https://pdfbox.apache.org/download.cgi#scm > > https://pdfbox.apache.org/2.0/getting-started.html > > > > If you still need help, one possibility would be 1) post the smallest > > possible code that fails, and 2) post a small part of the raw PDF, i.e. the > > objects relevant to the field in your code. > > > > > > Tilman > > > > > > Am 19.05.2015 um 23:03 schrieb Balaji Venkatamohan: > > > >> Moreover, for every page of the compressed PDF (there are 3 pages), I > >> tried getting the COSStream for each of the page : > >> > >> PDPage firstPage=(PDPage) > >> document.getDocumentCatalog().getAllPages().get(0); > >> pdStream=firstPage.getContents(); > >> COSStream stream=pdStream.getStream(); > >> > >> In the above code snippet, the object stream, when analyzed in debug > >> mode, has the following: > >> > >> > >> The line from the compressed PDF as opened with Notepad++ is : > >> > >> <</Filter/FlateDecode/Length 5675>>stream > >> > >> From this point on, using the COSStream object for every page, how can I > >> decompress and find out the acroform fields given that the unFilteredStream > >> object is null for COSStream? > >> > >> > >> On Tue, May 19, 2015 at 1:38 PM, Balaji Venkatamohan <[email protected] > >> <mailto:[email protected]>> wrote: > >> > >> Thank you for your response Tilman. > >> > >> I had previously tried using the WriteDecodedDoc for my compressed > >> PDF and I tried to get the number of acro form fields present in > >> the output file generated by WriteDecodedDoc. The API still could > >> not find the acro form fields in the generated decompressed file. > >> Also the decompressed file generated is 75 KB which is far less > >> than the original decompressed file which I have (1.6 MB) though I > >> could edit the acro form fields using acrobat reader. > >> > >> Thanks, > >> Balaji > >> > >> > >> > >> On Tue, May 19, 2015 at 1:18 PM, Tilman Hausherr > >> <[email protected] <mailto:[email protected]>> wrote: > >> > >> Am 19.05.2015 um 21:35 schrieb Balaji Venkatamohan: > >> > >> My question is: how do I flatedecode a PDF so that I can > >> find all the > >> acroform fields within it. ANy help or pointers would be > >> highly appreciated. > >> > >> > >> You could try the WriteDecodedDoc option of the command line app > >> https://pdfbox.apache.org/1.8/commandline.html#writeDecodeDoc > >> > >> Maybe you can have further ideas by comparing the two files > >> with NOTEPAD++.... however the two files might have their > >> objects in different order. > >> > >> Tilman > >> > >> > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [email protected] > >> <mailto:[email protected]> > >> For additional commands, e-mail: [email protected] > >> <mailto:[email protected]> > >> > >> > >> > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

