Hi,
i get the following error message when I try to parse a csv file: Can't
retrieve Tika parser for mime-type text/csv...
I use nutch 1.4 and solr 3.6...

The parsechecker gives the same message:
bin/nutch parsechecker http://dsiwikis/documents/forms/open_source_decls.csv
fetching: http://dsiwikis/documents/forms/open_source_decls.csv
parsing: http://dsiwikis/documents/forms/open_source_decls.csv
contentType: text/csv
---------
Url
---------------
http://dsiwikis/documents/forms/open_source_decls.csv---------
ParseData
---------
Version: 5
Status: failed(2,0): Can't retrieve Tika parser for mime-type text/csv
Title:
Outlinks: 0
Content Metadata:
Parse Metadata:

My parse-plugins.xml file contains:
        <mimeType name="text/csv">
                <plugin id="parse-tika" />
        </mimeType>

and my nutch-default.xml contains:
<property>
  <name>plugin.includes</name>
 
<value>protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|sc
oring-opic|urlnormalizer-(pass|regex|basic)</value>
  <description>Regular expression naming plugin directory names to
  include.  Any plugin not matching this expression is excluded.
  In any case you need at least include the nutch-extensionpoints plugin. By
  default Nutch includes crawling just HTML and plain text via HTTP,
  and basic indexing and search plugins. In order to use HTTPS please enable
  protocol-httpclient, but be aware of possible intermittent problems with
the
  underlying commons-httpclient library.
  </description>
</property>

I searched the list and found something about this error but the thread
changed from direction and provided no answer to the original problem...
Any idea?


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-t-retrieve-Tika-parser-for-mime-type-text-csv-tp3990071.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to