Yikes, forgot to mention, this is using Tika 1.0 with all the appropriate dependencies. I'm using pdfbox version 1.6.0. This is running on Java 6 on Mac OS X version 10.6.
Any help getting pdfs to parse would be greatly appreciated! Thanks, -Arthur Meneau On Dec 5, 2011, at 2:32 PM, Arthur Meneau wrote: > I'm having some issues when I attempt to parse pdf and docx files while using > ForkParser. I finally figured out how to get the metadata from the content > handler using ToXMLContentHandler, but I am not getting any results when I > parse pdfs or DocX files using ForkParser. I have copy and pasted the error > and stack trace below, but I believe this only applies to PDF files as I am > not getting a unique error for DocX files. > > I did a little searching, using java's jar utility to verify that the class > shows up in the Jar and it does. I was previously using the AutoDetectParser > and that was working just fine with the exception that I could not limit > tika's memory usage, so this seems to be unique to ForkParser. > > Thanks, > -Arthur > > StackTrace: > log4j:WARN Caught Exception while in Loader.getResource. This may be > innocuous. > java.lang.NoClassDefFoundError: > org/apache/tika/fork/MemoryURLStreamHandler$Record > at > org.apache.tika.fork.MemoryURLStreamHandler.createURL(MemoryURLStreamHandler.java:46) > at > org.apache.tika.fork.ClassLoaderProxy.findResource(ClassLoaderProxy.java:73) > at java.lang.ClassLoader.getResource(ClassLoader.java:977) > at org.apache.log4j.helpers.Loader.getResource(Loader.java:96) > at org.apache.log4j.LogManager.<clinit>(LogManager.java:105) > at org.apache.log4j.Logger.getLogger(Logger.java:104) > at > org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:289) > at > org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:109) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at > org.apache.commons.logging.impl.LogFactoryImpl.createLogFromClass(LogFactoryImpl.java:1116) > at > org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:914) > at > org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:604) > at > org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:336) > at > org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:310) > at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685) > at org.apache.pdfbox.pdfparser.BaseParser.<clinit>(BaseParser.java:58) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1087) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053) > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:80) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.tika.fork.ForkServer.call(ForkServer.java:136) > at org.apache.tika.fork.ForkServer.processRequests(ForkServer.java:116) > at org.apache.tika.fork.ForkServer.main(ForkServer.java:64)
