Hi all.
I have developed a custom parser plugin to parse image documents, but i need
tika parsers for image to, so how can I set multiple parser for any document ?
I have read this thread but it say that it is only posible for html documents,
I need allow multiple parser for png,jpg,jpeg,gif, because i need extract some
fields using tika and other with my custom parse plugins, this is my
parse-plugins.xml file, now nutch is ignoring parse-tika for image mime type
and this is incorrect for my purpose, please some advice or suggestion will be
accepted.
********************************************************
<mimeType name="*">
<plugin id="parse-tika" />
</mimeType>
<!--**imágenes ****-->
<mimeType name="image/png">
<plugin id="parse-thumb" />
</mimeType>
<mimeType name="image/jpg">
<plugin id="parse-thumb" />
</mimeType>
<mimeType name="image/jpeg">
<plugin id="parse-thumb" />
</mimeType>
<mimeType name="image/gif">
<plugin id="parse-thumb" />
</mimeType>
<mimeType name="image/bmp">
<plugin id="parse-thumb" />
</mimeType>
<mimeType name="image/ico">
<plugin id="parse-thumb" />
</mimeType>
<!-- Types for parse-ext plugin: required for unit tests to pass. -->
<mimeType name="application/vnd.nutch.example.cat">
<plugin id="parse-ext" />
</mimeType>
<mimeType name="application/vnd.nutch.example.md5sum">
<plugin id="parse-ext" />
</mimeType>
<!-- alias mappings for parse-xxx names to the actual extension
implementation
ids described in each plugin's plugin.xml file -->
<aliases>
<alias name="parse-thumb"
extension-id="ImageThumbnailParser" />
<alias name="parse-tika"
extension-id="org.apache.nutch.parse.tika.TikaParser" />
<alias name="parse-ext" extension-id="ExtParser" />
<alias name="parse-html"
extension-id="org.apache.nutch.parse.html.HtmlParser" />
<alias name="parse-js" extension-id="JSParser" />
<alias name="feed"
extension-id="org.apache.nutch.parse.feed.FeedParser" />
<alias name="parse-swf"
extension-id="org.apache.nutch.parse.swf.SWFParser" />
<alias name="parse-zip"
extension-id="org.apache.nutch.parse.zip.ZipParser" />
</aliases>
</parse-plugins>
********************************************************
http://lucene.472066.n3.nabble.com/Multiple-parsers-td3806721.html
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci