Ah, ok. That makes sense then. By checkbox I thought you meant a checkbox you can electronically interact with. If you do have a flat, non-form PDF then OCR would be the only way.
One potential solution is to pre-process it with Acrobat and have it guess form fields (this actually works fairly well in some cases). THat would turn it into a form PDF. If you send me the form I can take a look. Duane Nickull *********************************** Technoracle Advanced Systems Inc. Consulting and Contracting; Proven Results! i. Neo4J, PDF, Java, LiveCycle ES, Flex, AIR, CQ5 & Mobile b. http://technoracle.blogspot.com t. @duanechaos "Don't fear the Graph! Embrace Neo4J" On 2012-09-04 7:17 AM, "David Hoffer" <[email protected]> wrote: >FYI, I just learned with some email discussions with iText users that >the check boxes and check box marks are made using vector drawing >instructions. Is that something that PDFBox can help parse? Or since >I know where the check boxes are located am I better off converting >this to an image and 'reading' the data via pixel analysis? > >-Dave > >On Tue, Sep 4, 2012 at 7:18 AM, David Hoffer <[email protected]> wrote: >> Hi Duane, >> >> Thanks for your reply. I'll attach a sample of the type of document I >> am trying to parse. As you can see it does have check boxes but it's >> not a form based document. >> >> (Note that the check boxes might not be technically radio buttons from >> the point of view of the PDF document...but in actual practice users >> will check at most one per group of boxes.) >> >> Thanks, >> -Dave >> >> On Mon, Sep 3, 2012 at 10:37 PM, Duane Nickull >> <[email protected]> wrote: >>> If you have a PDF document with a check box, by definition it is a >>>"form" >>> (it is a mutable document). A radio button is, but definition, a >>>group of >>> choices where one is mutually selectable (two cannot be chosen). >>> >>> It is not that tricky to get access to the checkbox. There are >>>examples >>> on the PDFBox website and also within the API Docs. >>> >>> java.lang.Object >>> >>><http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Object.html?is-extern >>>al= >>> true> >>> org.apache.pdfbox.pdmodel.interactive.form.PDField >>> >>><http://pdfbox.apache.org/apidocs/org/apache/pdfbox/pdmodel/interactive/ >>>for >>> m/PDField.html> >>> org.apache.pdfbox.pdmodel.interactive.form.PDChoiceButton >>> >>><http://pdfbox.apache.org/apidocs/org/apache/pdfbox/pdmodel/interactive/ >>>for >>> m/PDChoiceButton.html> >>> org.apache.pdfbox.pdmodel.interactive.form.PDCheckbox >>> >>> >>> >>> isCheckedpublic boolean isChecked() >>> >>> This will tell if this radio button is currently checked or not. >>> >>> Returns:true If the radio button is checked. >>> >>> If you require specific help with this, many of our staff are ex-adobe >>> experts on PDF forms. >>> >>> Duane Nickull >>> >>> *********************************** >>> Technoracle Advanced Systems Inc. >>> Consulting and Contracting; Proven Results! >>> i. Neo4J, PDF, Java, LiveCycle ES, Flex, AIR, CQ5 & Mobile >>> b. http://technoracle.blogspot.com >>> t. @duanechaos >>> "Don't fear the Graph! Embrace Neo4J" >>> >>> >>> >>> >>> >>> >>> On 2012-09-03 6:47 AM, "David Hoffer" <[email protected]> wrote: >>> >>>>I have PDF's (regular pdf, non-form type) that contain check boxes and >>>>I need to parse which one is selected. So each group of check boxes >>>>is a radio button group where only one will be selected/checked. How >>>>can I parse this to find out which in the group is checked? >>>> >>>>Thanks, >>>>-Dave >>> >>>

