Hello,

I have made some additions (a new parser) to the Apache Tika application and
I am trying to see if I can run my new changes using the crawl mechanism in
Nutch, but I am having some trouble updating Nutch with my modified Tika
application.

The Tika updates I made run fine if I run Tika as a standalone using either
the command line or the Tika GUI.

I am using Nutch 1.2, 1.3 seems to not be able to run for me (I get an error
saying C:/Program not found whenever I try to do anything but 1.2 should be
fine for what I am trying to do which is just to see the parse results from
the new parser I added to Tika).

I have replaced the tika-core.jar, tika-parsers.jar and tika-mimetypes.xml
files with my versions of those files as described in the following link:
http://issues.apache.org/jira/browse/NUTCH-766. I also updated the
nutch-site.xml to enable the parse-tika plugin. I also updated the
parse-plugins.xml file with the following (afm files are what I am trying to
parse):

        <mimeType name="application/x-font-afm">
                <plugin id="parse-tika" />
        </mimeType>

I am crawling a personal site in which I have links to .afm files. If I
crawl before making any updates to Nutch, it fetches the files fine. After
making the updates detailed above, I get the following error: "fetch of
http://scf.usc.edu/~jfarreol/woor2___.AFM failed with:
java.lang.NoClassDefFoundError: org/apache/james/mime4j/MimeException".

Not really sure, what the issue is but my guess is that I have not updated
all the necessary files. Any help would be greatly appreciated.

Thank you,
Fernando Arreola

Reply via email to