Hi Fernando
> I have made some additions (a new parser) to the Apache Tika application > and > I am trying to see if I can run my new changes using the crawl mechanism in > Nutch, but I am having some trouble updating Nutch with my modified Tika > application. > > The Tika updates I made run fine if I run Tika as a standalone using either > the command line or the Tika GUI. > OK > > I am using Nutch 1.2, 1.3 seems to not be able to run for me (I get an > error > saying C:/Program not found whenever I try to do anything but 1.2 should be > fine for what I am trying to do which is just to see the parse results from > the new parser I added to Tika). > > I have replaced the tika-core.jar, tika-parsers.jar and tika-mimetypes.xml > files with my versions of those files as described in the following link: > http://issues.apache.org/jira/browse/NUTCH-766. I also updated the > nutch-site.xml to enable the parse-tika plugin. I also updated the > parse-plugins.xml file with the following (afm files are what I am trying > to > parse): > > <mimeType name="application/x-font-afm"> > <plugin id="parse-tika" /> > </mimeType> > This is not necessary as by default parse-tika is used for any mime-type unless the mapping mime-type / parser is specified in parse-plugins.xml. This should not have an impact though > > I am crawling a personal site in which I have links to .afm files. If I > crawl before making any updates to Nutch, it fetches the files fine. After > making the updates detailed above, I get the following error: "fetch of > http://scf.usc.edu/~jfarreol/woor2___.AFM failed with: > java.lang.NoClassDefFoundError: org/apache/james/mime4j/MimeException". > > Not really sure, what the issue is but my guess is that I have not updated > all the necessary files. Any help would be greatly appreciated. > yep, sounds like you have a few jars missing. Nutch-1.2 came with tika-0.7, which version of tika are you trying to use? if you just added a new parser then it would be easier to ship it as a separate jar file. I assume that you did not have to modify anything in tika-core, so you could use the standard tika libs and simply add yours using Ivy. Nutch-1.3 (and 1.4 in SVN) contain a lot of improvements over 1.2 so it would be worth getting to the bottom of the issue you're encountering and get 1.3 to work. Moreover I am not sure that you can use a version of Tika > 0.7 on Nutch 1.2 without changing parts of the code (to be checked though) Julien -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com

