hi,
when i am trying to parse the pdf which is password protected using this code:
InputStream input = new FileInputStream(new File(resourceLocation));
ContentHandler textHandler = new BodyContentHandler();
Metadata metadata = new Metadata();
PDFParser parser = new PDFParser();
parser.parse(input, textHandler, metadata);
input.close();
out.println("Title: " + metadata.get("title"));
out.println("Author: " + metadata.get("Author"));
out.println("content: " + textHandler.toString());
i am getting this exception:
Could not parse document:class
org.apache.tika.exception.TikaException:Unable to extract PDF content
org.apache.tika.exception.TikaException: Unable to extract PDF content
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:76)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:96)
at org.apache.tika.parser.AbstractParser.parse(AbstractParser.java:53)
at
com.lucidimagination.article.tika.TikaParsePdf.parse(TikaParsePdf.java:43)
at
com.lucidimagination.article.tika.TikaParsePdf.main(TikaParsePdf.java:28)
Caused by: org.apache.pdfbox.exceptions.WrappedIOException: Error
decrypting document, details:
at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:314)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:61)
... 4 more
Caused by: org.apache.pdfbox.exceptions.CryptographyException: Error:
The supplied password does not match either the owner or user password
in the document.
at
org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.decryptDocument(StandardSecurityHandler.java:239)
at
org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1325)
at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:796)
at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:310)
... 5 more
so can anybody help me how to parse the password protected pdf.
thanks and regards
chethan