Re: How to flatedecode and find all acroform fields in a compressed PDF

Andreas Lehmkühler Thu, 21 May 2015 23:31:43 -0700

Hi,

> Balaji Venkatamohan <[email protected]> hat am 20. Mai 2015 um 03:24
> geschrieben:
> 
> 
> Thank you for your pointers and sorry about the image. I am attaching it
> with this email.
> 
> The point I am trying to make is that the PDF, which was decompressed using
> WriteDecodedDoc, is smaller in size than the original PDF given to us by
> our customers.
> Also, the decompressed PDF generated by WriterDecodedDoc of PDFBox did not
> have any PDAcroform fields whereas the decompressed PDF given to us by the
> customers does contain Acroform fields. Hence I wanted to know how to
> properly decompress the PDF using pdfbox APIs. The reason why I was
> analyzing COSStream was to check if the decompression of the compressed PDF
> was happening correctly while using PDFBox APIs.
> I know it would have been difficult for you to help me without the actual
> PDFs. For that, I would like to thank you for your time and pointers.
Maybe it's worth to try to share the file "visually" with us. Open both files
(compressed and decompressed) with PDFDebugger [1] and post a screenshot of both
somehwere (dropbox etc.) and share the link with us. Maybe that could shed some
light on your issue.


BR
Andreas Lehmkühler

[1] http://pdfbox.apache.org/1.8/commandline.html#pdfDebugger

> 
> On Tue, May 19, 2015 at 2:57 PM, Tilman Hausherr <[email protected]>
> wrote:
> 
> > Hi,
> >
> > The image doesn't appear in the mailing list.
> >
> > This is all very confusing... /acroform is in the document catalog. I
> > don't see how the page content stream is related to it. The best is that
> > you either go through the source code, or read the spec and then look at
> > the pdf.
> >
> > To find out what's going on, you'd have to start from that /acroform entry
> > and then compare the two files.
> >
> > It is really difficult to help you without the files. The cause could be a
> > bug in pdfbox, or a malformed pdf...
> >
> > Some more ideas:
> > - use loadNonSeq(file, null) instead of load(file)
> > - try the unreleased 2.0 version, that one has some improvements in the
> > acroform stuff. Note that the API is different.
> > https://pdfbox.apache.org/download.cgi#scm
> > https://pdfbox.apache.org/2.0/getting-started.html
> >
> > If you still need help, one possibility would be 1) post the smallest
> > possible code that fails, and 2) post a small part of the raw PDF, i.e. the
> > objects relevant to the field in your code.
> >
> >
> > Tilman
> >
> >
> > Am 19.05.2015 um 23:03 schrieb Balaji Venkatamohan:
> >
> >> Moreover, for every page of the compressed PDF (there are 3 pages), I
> >> tried getting the COSStream for each of the page :
> >>
> >> PDPage firstPage=(PDPage)
> >> document.getDocumentCatalog().getAllPages().get(0);
> >>             pdStream=firstPage.getContents();
> >>             COSStream stream=pdStream.getStream();
> >>
> >> In the above code snippet, the object stream, when analyzed in debug
> >> mode, has the following:
> >>
> >>
> >> The line from the compressed PDF as opened with Notepad++ is :
> >>
> >> <</Filter/FlateDecode/Length 5675>>stream
> >>
> >> From this point on, using the COSStream object for every page, how can I
> >> decompress and find out the acroform fields given that the unFilteredStream
> >> object is null for COSStream?
> >> 
> >>
> >> On Tue, May 19, 2015 at 1:38 PM, Balaji Venkatamohan <[email protected]
> >> <mailto:[email protected]>> wrote:
> >>
> >>     Thank you for your response Tilman.
> >>
> >>     I had previously tried using the WriteDecodedDoc for my compressed
> >>     PDF and I tried to get the number of acro form fields present in
> >>  the output file generated by WriteDecodedDoc. The API still could
> >>     not find the acro form fields in the generated decompressed file.
> >>      Also the decompressed file generated is 75 KB which is far less
> >>     than the original decompressed file which I have (1.6 MB) though I
> >>     could edit the acro form fields using acrobat reader.
> >>
> >>     Thanks,
> >>     Balaji
> >>
> >>
> >>
> >>     On Tue, May 19, 2015 at 1:18 PM, Tilman Hausherr
> >>     <[email protected] <mailto:[email protected]>> wrote:
> >>
> >>         Am 19.05.2015 um 21:35 schrieb Balaji Venkatamohan:
> >>
> >>             My question is: how do I flatedecode a PDF so that I can
> >>             find all the
> >>             acroform fields within it. ANy help or pointers would be
> >>             highly appreciated.
> >>
> >>
> >>         You could try the WriteDecodedDoc option of the command line app
> >>         https://pdfbox.apache.org/1.8/commandline.html#writeDecodeDoc
> >>
> >>         Maybe you can have further ideas by comparing the two files
> >>         with NOTEPAD++.... however the two files might have their
> >>         objects in different order.
> >>
> >>         Tilman
> >>
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >>         To unsubscribe, e-mail: [email protected]
> >>         <mailto:[email protected]>
> >>         For additional commands, e-mail: [email protected]
> >>         <mailto:[email protected]>
> >>
> >>
> >>
> >>
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: How to flatedecode and find all acroform fields in a compressed PDF

Reply via email to