Hi

> It could be an XFA forms pdf... then you'd have to analyze the XML content.
I opened the pdf in a text editor, and I can say the boxes are in a
stream xml entity, in binary format. (By removing some binary, I have
been able to remove the boxes.
Does it exclude the XFA form pdf nature ?

> It could be ordinary text, then the text stripper would do the job.
The regular textstripper does not extract them. Does it exclude the text
nature ?

Thanks a lot

On Thu, Nov 29, 2018 at 08:04:51AM +0100, Tilman Hausherr wrote:
> It could be an XFA forms pdf... then you'd have to analyze the XML content.
> 
> It could be widgets annotations without acroform, then you'd have to analyse
> these.
> 
> It could be ordinary text, then the text stripper would do the job.
> 
> It could be vector graphics, then it gets really difficult.
> 
> Tilman
> 
> Am 28.11.2018 um 23:05 schrieb Nicolas Paris:
> > Hi
> > 
> > I have several pdf created with PDFCreator 2.0.1.0 and I want to extract
> > the content as text, including the checkboxes values in it.
> > 
> > THe pdf looks like a regular form pdf with checkboxes. However it is not
> > a acro form based pdf, and the regular pdfbox code I use in this case
> > does not apply : the acroform is null !
> > 
> > I wonder how I can iterate on those checkboxes (or visually equivalent)
> > objects or symbols.
> > 
> > If someone can give me a starter to list all objects in that pdf, that
> > might be helpful to begin with.
> > 
> > Thanks by advance,
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 

-- 
nicolas

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to