As Tilman pointed out XFA is XML so as soon as you have the XML you can extract 
the content you are interested in.

If you have a static XFA based form (i.e. there is an XFA entry and the /Fields 
entry is not empty) you can safely ignore the
fact that it's XFA and do the text extraction as you'd do for a 'normal' PDF.

For a dynamic XFA based PDF you could parse the XML but please note that do to 
it's dynamic nature the correct state of the
rendered document is determined at runtime executing data binding rules and 
JavaScript so can not be dertermined without
rendering the XFA first. You could statically extract the content you are 
interested in from the XML though. 

AFAIK Apache Tika does some (static) extraction of XFA content (fields).

BR
Maruan

> We don't handle xfa, you're on your own there, or should buy a product 
> that can (I think itext can do it).
> 
> XFA is some sort of XML. So after you have getDocument() you need to 
> look at the XML you get. The XFA specification is 1500 pages long.
> 
> If all the documents you want to handle have the same content, then you 
> might be able to get what you need without reading it.
> 
> Tilman
> 
> Am 23.02.2019 um 02:55 schrieb Nick Westerly:
> > Hi, my ultimate goal is to extract text data from PDFs forms using xfa. Is
> > it possible to use pdfbox to flatten PDFs with xfa forms ( to simplify text
> > extraction).
> > 
> > If not can the fields themselves be easily parsed?
> > 
> > I see
> > https://stackoverflow.com/questions/14454387/pdfbox-how-to-flatten-a-pdf-form
> > which seems to say that xfa is not flatten able?
> > 
> > I see this class,
> > https://pdfbox.apache.org/docs/1.8.12/javadocs/org/apache/pdfbox/pdmodel/interactive/form/PDXFA.html,
> > once I call getDocument, how can I get fields (by name/type/) and contents?
> > 
> > Thanks!
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 
-- 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to