As Tilman pointed out XFA is XML so as soon as you have the XML you can extract the content you are interested in.
If you have a static XFA based form (i.e. there is an XFA entry and the /Fields entry is not empty) you can safely ignore the fact that it's XFA and do the text extraction as you'd do for a 'normal' PDF. For a dynamic XFA based PDF you could parse the XML but please note that do to it's dynamic nature the correct state of the rendered document is determined at runtime executing data binding rules and JavaScript so can not be dertermined without rendering the XFA first. You could statically extract the content you are interested in from the XML though. AFAIK Apache Tika does some (static) extraction of XFA content (fields). BR Maruan > We don't handle xfa, you're on your own there, or should buy a product > that can (I think itext can do it). > > XFA is some sort of XML. So after you have getDocument() you need to > look at the XML you get. The XFA specification is 1500 pages long. > > If all the documents you want to handle have the same content, then you > might be able to get what you need without reading it. > > Tilman > > Am 23.02.2019 um 02:55 schrieb Nick Westerly: > > Hi, my ultimate goal is to extract text data from PDFs forms using xfa. Is > > it possible to use pdfbox to flatten PDFs with xfa forms ( to simplify text > > extraction). > > > > If not can the fields themselves be easily parsed? > > > > I see > > https://stackoverflow.com/questions/14454387/pdfbox-how-to-flatten-a-pdf-form > > which seems to say that xfa is not flatten able? > > > > I see this class, > > https://pdfbox.apache.org/docs/1.8.12/javadocs/org/apache/pdfbox/pdmodel/interactive/form/PDXFA.html, > > once I call getDocument, how can I get fields (by name/type/) and contents? > > > > Thanks! > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > -- --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

