On Sun, 20 Jun 2010 17:22:12 +0200 reinhard schwab <[email protected]> wrote:
> if you read the README.txt file, you will read > > Apache Nutch README > > Important note: Due to licensing issues we cannot provide two > libraries that are normally provided with PDFBox (jai_core.jar, > jai_codec.jar), the parser library we use for parsing PDF files. > If you encounter unexpected problems when > working with PDF files please [...] I think he is saying that he followed those instructions, and some PDFs still fail. Peter, are you still getting the error message about java.lang.NoClassDefFoundError: javax/media/jai/PlanarImage or are the PDFs failing for some other reason? I know that the instructions for the Java Advanced Imaging API worked for us. If JAI is still failing, are you sure that the JAI .jar files are in the right place? Where did you put them? Regards, Gora

