pdf acroform and tika

Allison, Timothy B. Thu, 23 Feb 2012 09:30:18 -0800

Not sure if this is an issue for PDFBox or Tika, but I noticed that PDFBox's 
textstripper is not extracting information from the form fields in a batch of 
pdf documents I'm processing.  Is anyone else having this problem?
I regret that I'm unable to send an example document.
Inelegant solution with error handling not included:
StringBuilder sb = new StringBuilder();
//get text with text stripper and then
PDDocumentCatalog catalog = pdDoc.getDocumentCatalog();
if (catalog != null){
       PDAcroForm form = catalog.getAcroForm();
       if (form != null){
             List<PDField> fields = form.getFields();
             for (PDField field : fields){
                    sb.append(field.getFullyQualifiedName() +": "+ 
field.getValue()+"\r\n");
             }
       }
}

pdf acroform and tika

Reply via email to