Hi Fernando

> I have made some additions (a new parser) to the Apache Tika application
> and
> I am trying to see if I can run my new changes using the crawl mechanism in
> Nutch, but I am having some trouble updating Nutch with my modified Tika
> application.
>
> The Tika updates I made run fine if I run Tika as a standalone using either
> the command line or the Tika GUI.
>

OK


>
> I am using Nutch 1.2, 1.3 seems to not be able to run for me (I get an
> error
> saying C:/Program not found whenever I try to do anything but 1.2 should be
> fine for what I am trying to do which is just to see the parse results from
> the new parser I added to Tika).
>
> I have replaced the tika-core.jar, tika-parsers.jar and tika-mimetypes.xml
> files with my versions of those files as described in the following link:
> http://issues.apache.org/jira/browse/NUTCH-766. I also updated the
> nutch-site.xml to enable the parse-tika plugin. I also updated the
> parse-plugins.xml file with the following (afm files are what I am trying
> to
> parse):
>
>        <mimeType name="application/x-font-afm">
>                <plugin id="parse-tika" />
>        </mimeType>
>

This is not necessary as by default parse-tika is used for any mime-type
unless the mapping mime-type / parser is specified in parse-plugins.xml.
This should not have an impact though


>
> I am crawling a personal site in which I have links to .afm files. If I
> crawl before making any updates to Nutch, it fetches the files fine. After
> making the updates detailed above, I get the following error: "fetch of
> http://scf.usc.edu/~jfarreol/woor2___.AFM failed with:
> java.lang.NoClassDefFoundError: org/apache/james/mime4j/MimeException".
>
> Not really sure, what the issue is but my guess is that I have not updated
> all the necessary files. Any help would be greatly appreciated.
>

yep, sounds like you have a few jars missing. Nutch-1.2 came with tika-0.7,
which version of tika are you trying to use?
if you just added a new parser then it would be easier to ship it as a
separate jar file. I assume that you did not have to modify anything in
tika-core, so you could use the standard tika libs and simply add yours
using Ivy.

Nutch-1.3 (and 1.4 in SVN) contain a lot of improvements over 1.2 so it
would be worth getting to the bottom of the issue you're encountering and
get 1.3 to work. Moreover I am not sure that you can use a version of Tika >
0.7 on Nutch 1.2 without changing parts of the code (to be checked though)

Julien




-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Reply via email to