I tested you patch and confirmed that this does NOT work for encrypted files. Here's the stacktrace:
Exception in thread "main" org.apache.pdfbox.exceptions.CryptographyException: Error: The supplied password does not match either the owner or user password in the document. at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.decryptDocument(StandardSecurityHandler.java:231) at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1014) at org.apache.pdfbox.ExtractText.main(ExtractText.java:184) In case my line numbers are off, line 184 is: document.openProtection( sdm ); which happens before the lines which were commented out by your patch. I believe you're saying that the text can be extracted from password protected, non-encrypted files. If it's possible to password protect PDFs without using encryption, that's news to me. I'm not sure what the point would be of password protecting something if you're not going to encrypt it, since that would only give a false sense of security, not any actual security. So, I just wanted to clear that up so people don't read your post and think that all PDF security is completely broken. When I first read it, I thought you were implying that any password protected document could be read without the password. As for whether we "should" be able to do this or not, I'd say the ExtractText program which comes with PDFBox should respect the permissions by default, and perhaps have an option to extract password protected, unencrypted documents (without a password). I'm not sure what one would call that option... -bypassPassword ? --Adam "Takashi Komatsubara" <takashi....@gmail.com> 08/31/2009 04:05 Please respond to pdfbox-dev@incubator.apache.org To <pdfbox-dev@incubator.apache.org> cc Subject Do we should be able to extract text from ownter-password protected pdf file? Hi team, Technically, we can do extract text from "Owner" password protected pdf file without specifing "owner" password. Right? Do we should be able to do that ? or not. The reason why I'm asking is I am using the PDFBox for audting the content of the pdf file. So, whether the user want to make "text extract" permission disabled or not, I need to look into the content of the "owner password" protected pdf file. Old PDFbox could do this. What do you think? Takashi ? Click here to submit conditions This email and any content within or attached hereto from Sun West Mortgage Company, Inc. is confidential and/or legally privileged. The information is intended only for the use of the individual or entity named on this email. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or the taking of any action in reliance on the contents of this email information is strictly prohibited, and that the documents should be returned to this office immediately by email. Receipt by anyone other than the intended recipient is not a waiver of any privilege. Please do not include your social security number, account number, or any other personal or financial information in the content of the email. Should you have any questions, please call (800) 453 7884.