Re: [External] Re: Extracting non-form checkboxe values

Conlin, Joshua [USA] Thu, 25 Aug 2016 12:37:06 -0700

First off, Thanks for your quick reply and help.  I am new to PDFBox, and
am using version 2.0.1.  XFA is indeed unavailable.  I am unable to upload
a sample PDF due to privacy concerns.  I ran the PDFDebugger against this
file and it produced some output.  Here is the general structure for page
1 (which contains 48 check boxes):


Page:1
 [] Annots: (0)
 <<>> Contents: (2) [5 0 R]
     / Filter: FlateDecode
     84 Length: 7141
 []  MediaBox: (4)
     84 0: 0
     84 1: 0
     84 2: 612
     84 3: 792
 <<>> Parent: (4) [ 4 0 R] /T:Pages (not sure if more details is needed on
this)
 <<>> Resources: (2) [7 0 R]
       <<>> Font (4)
          <<>> TT1: (8) [8 0 R] /T:Font /S:TrueType
          <<>> TT2: (8) [9 0 R] /T:Font /S:TrueType

          <<>> TT3: (8) [10 0 R] /T:Font /S:TrueType

          <<>> TT4: (8) [11 0 R] /T:Font /S:TrueType

 []ProcSet: (2)
      / 0: PDF
      / 1: Text

I¹m sort of leaning towards the image capture idea but not sure where to
start (extracting a pDF subsection as an image). Any insight there?  Worst
case scenario I suppose I could export the entire page to an image and do
some analysis there.  The solution doesn¹t necessarily have to be
performant.

I¹d like to avoid using a separate OCR framework and just stick with
PDFBox if possible.

Thanks again for your help.

Josh

On 8/25/16, 1:21 AM, "Maruan Sahyoun" <sahy...@fileaffairs.de> wrote:

>
>> Am 24.08.2016 um 19:24 schrieb Tilman Hausherr <thaush...@t-online.de>:
>> 
>> Am 24.08.2016 um 18:22 schrieb Conlin, Joshua [USA]:
>>> Hello,
>>> 
>>> 
>>> I am trying to extract checkbox values from a document where the acro
>>>form is null.  I have seen several previous inquiries to this scenario
>>>but haven't found a definitive answer.  I was wondering if there is a
>>>suggested approach?
>> 
>> Maybe XFA?
>
>AFAIU if there is no acroform there will also be no XFA.
>
>Would it be possible to upload a sample PDF to a public location so we
>can take a look.
>
>BR
>
>Maruan
>
>> 
>> Tilman
>> 
>> 
>>> 
>>> 
>>> Alternatively, Is there a way to extract a subsection of a PDF and
>>>create an image from that.  To be clear I am not talking about
>>>extracting an image, but creating an image from a rectangle or similar
>>>area within  a page?  In this maybe naive approach I could extract the
>>>checkbox location as an image and determine if it is checked or not.
>>>Any help or insight you could provide would be appreciated.
>>> 
>>> 
>>> Thanks,
>>> 
>>> 
>>> Josh
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
>> For additional commands, e-mail: users-h...@pdfbox.apache.org
>> 
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
>For additional commands, e-mail: users-h...@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: [External] Re: Extracting non-form checkboxe values

Reply via email to