Hi > It could be an XFA forms pdf... then you'd have to analyze the XML content. I opened the pdf in a text editor, and I can say the boxes are in a stream xml entity, in binary format. (By removing some binary, I have been able to remove the boxes. Does it exclude the XFA form pdf nature ?
> It could be ordinary text, then the text stripper would do the job. The regular textstripper does not extract them. Does it exclude the text nature ? Thanks a lot On Thu, Nov 29, 2018 at 08:04:51AM +0100, Tilman Hausherr wrote: > It could be an XFA forms pdf... then you'd have to analyze the XML content. > > It could be widgets annotations without acroform, then you'd have to analyse > these. > > It could be ordinary text, then the text stripper would do the job. > > It could be vector graphics, then it gets really difficult. > > Tilman > > Am 28.11.2018 um 23:05 schrieb Nicolas Paris: > > Hi > > > > I have several pdf created with PDFCreator 2.0.1.0 and I want to extract > > the content as text, including the checkboxes values in it. > > > > THe pdf looks like a regular form pdf with checkboxes. However it is not > > a acro form based pdf, and the regular pdfbox code I use in this case > > does not apply : the acroform is null ! > > > > I wonder how I can iterate on those checkboxes (or visually equivalent) > > objects or symbols. > > > > If someone can give me a starter to list all objects in that pdf, that > > might be helpful to begin with. > > > > Thanks by advance, > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > -- nicolas --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

