Hi, On Wed, Dec 2, 2009 at 9:11 AM, <[email protected]> wrote: > when I use pdfbox-0.8.0-incubating with Lucene 3.0, 2.9 I get warnings > on the command line > [...] > INFO: unsupported/disabled operation: rg > org.apache.pdfbox.util.PDFStreamEngine processOperator > LucenePDFDocument.addContent(Document, InputStream, String) line: 413
Yes, I see that too and I'm a bit annoyed by it. There's actually no reason for this warning as those operations are not relevant to text extraction and thus PDFBox does the right thing by just ignoring them. We just need to disable the log messages for such cases. Can you file an improvement request about this in https://issues.apache.org/jira/browse/PDFBOX? I can take it from there. > I have to use commons-logging, otherwise i get a Class not found > Exception, but log slows down processing The commons-logging dependency is a bit troublesome in terms of classpath handling. Perhaps PDFBox should use the standard java.util.logging instead. Can you file an improvement request for that as well? We'll need to discuss it on d...@pdfbox. > My questions are when will pdfbox support the changed API from Lucene > 3.0 in LucenePDFDocument.getDocument? As soon as someone writes a patch with the required changes. :-) In fact I'd actually rather see us not depending on the Lucene API. A better approach would be to make the LucenePDFDocument class (or a more generically named alternative) simply return a Map of defined key-value pairs that the client application can then turn into a Lucene Document. > Why must i use bcprov-jdk14-136.jar and bcmail-jdk14-136.jar to just > check if PDF documents are encrypted? They are needed for actually encrypting or decrypting a document, but I guess they should be (or at least we could make them) optional if you just want to check whether the document is encrypted or not. BR, Jukka Zitting

