AFAIK you can't, but would be a nice improvement. On Thu, Apr 1, 2010 at 12:31 AM, Miguel Prieto <[email protected]> wrote: > I'm using JackRabbit as a repository for pdf documents and I have some > questions regarding Text Extraction. I'm using the Repository locally, not > remotely (rmi, dav). Model 1 as shown in the > http://jackrabbit.apache.org/deployment-models.html > > In http://wiki.apache.org/jackrabbit/Search you can read that: "*Text > extraction is done asynchronously in a in a background thread. That means > changed or added text is not available immediately...*". I've also seen the > configuration parameters, but I'll like to know a little bit more about how > and who is responsible for starting this thread. Can I Keep it from running? > (For example when doing a batch upload of documents) , Can I start it? Can > anyone give me a hint about this?. > > Also, I've been getting these 2 warnings after uploading some pdfs. How can > I know which documents (binary properties) where causing them?, Is there a > way I can handle these warnings with some sort of listener Class? > > *WARN * PDFStreamEngine: java.io.IOException: Error: expected hex character > and not :32 (PDFStreamEngine.java, line 529) > java.io.IOException: Error: expected hex character and not :32 > at > org.apache.fontbox.cmap.CMapParser.parseNextToken(CMapParser.java:316) > at org.apache.fontbox.cmap.CMapParser.parse(CMapParser.java:138) > at org.apache.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:488) > at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:363) > at > org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:343) > at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:50) > at > org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:516) > at > org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:229) > at > org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:188) > at > org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:367) > at > org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:291) > at > org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:247) > at > org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180) > at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56) > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101) > at > org.apache.jackrabbit.core.query.lucene.JackrabbitParser.parse(JackrabbitParser.java:189) > at > org.apache.jackrabbit.core.query.lucene.JackrabbitParser.parse(JackrabbitParser.java:195) > at > org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run(LazyTextExtractorField.java:160) > > > *WARN * LazyTextExtractorField: Failed to extract text from a binary > property (LazyTextExtractorField.java, line 165) > java.lang.NoClassDefFoundError: > org/bouncycastle/jce/provider/BouncyCastleProvider > at > org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1108) > at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573) > at > org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:235) > at > org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180) > at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56) > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101) > at > org.apache.jackrabbit.core.query.lucene.JackrabbitParser.parse(JackrabbitParser.java:189) > at > org.apache.jackrabbit.core.query.lucene.JackrabbitParser.parse(JackrabbitParser.java:195) > at > org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run(LazyTextExtractorField.java:160) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > > > Thanks, > > Miguel Prieto >
-- OpenKM http://www.openkm.com http://www.guia-ubuntu.org
