Yikes, forgot to mention, this is using Tika 1.0 with all the appropriate 
dependencies.  I'm using pdfbox version 1.6.0.  This is running on Java 6 on 
Mac OS X version 10.6.

Any help getting pdfs to parse would be greatly appreciated!

Thanks,
-Arthur Meneau

On Dec 5, 2011, at 2:32 PM, Arthur Meneau wrote:

> I'm having some issues when I attempt to parse pdf and docx files while using 
> ForkParser.  I finally figured out how to get the metadata from the content 
> handler using ToXMLContentHandler, but I am not getting any results when I 
> parse pdfs or DocX files using ForkParser.  I have copy and pasted the error 
> and stack trace below, but I believe this only applies to PDF files as I am 
> not getting a unique error for DocX files.
> 
> I did a little searching, using java's jar utility to verify that the class 
> shows up in the Jar and it does.  I was previously using the AutoDetectParser 
> and that was working just fine with the exception that I could not limit 
> tika's memory usage, so this seems to be unique to ForkParser.
> 
> Thanks,
> -Arthur
> 
> StackTrace: 
> log4j:WARN Caught Exception while in Loader.getResource. This may be 
> innocuous.
> java.lang.NoClassDefFoundError: 
> org/apache/tika/fork/MemoryURLStreamHandler$Record
>       at 
> org.apache.tika.fork.MemoryURLStreamHandler.createURL(MemoryURLStreamHandler.java:46)
>       at 
> org.apache.tika.fork.ClassLoaderProxy.findResource(ClassLoaderProxy.java:73)
>       at java.lang.ClassLoader.getResource(ClassLoader.java:977)
>       at org.apache.log4j.helpers.Loader.getResource(Loader.java:96)
>       at org.apache.log4j.LogManager.<clinit>(LogManager.java:105)
>       at org.apache.log4j.Logger.getLogger(Logger.java:104)
>       at 
> org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:289)
>       at 
> org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:109)
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>       at 
> org.apache.commons.logging.impl.LogFactoryImpl.createLogFromClass(LogFactoryImpl.java:1116)
>       at 
> org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:914)
>       at 
> org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:604)
>       at 
> org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:336)
>       at 
> org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:310)
>       at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
>       at org.apache.pdfbox.pdfparser.BaseParser.<clinit>(BaseParser.java:58)
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1087)
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053)
>       at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:80)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at org.apache.tika.fork.ForkServer.call(ForkServer.java:136)
>       at org.apache.tika.fork.ForkServer.processRequests(ForkServer.java:116)
>       at org.apache.tika.fork.ForkServer.main(ForkServer.java:64)

Reply via email to