Hi all.
I have developed a custom parser plugin to parse image documents, but i need 
tika parsers for image to, so how can I set multiple parser for any document ?
I have read this thread but it say that it is only posible for html documents, 
I need allow multiple parser for png,jpg,jpeg,gif, because i need extract some 
fields using tika and other with my custom parse plugins, this is my 
parse-plugins.xml file, now nutch is ignoring parse-tika for image mime type 
and this is incorrect for my purpose, please some advice or suggestion will be 
accepted.

********************************************************
<mimeType name="*">
          <plugin id="parse-tika" />
        </mimeType>

<!--**imágenes ****-->
        <mimeType name="image/png">
          <plugin id="parse-thumb" />
        </mimeType>

        <mimeType name="image/jpg">
          <plugin id="parse-thumb" />
        </mimeType>

        <mimeType name="image/jpeg">
          <plugin id="parse-thumb" />
        </mimeType>

        <mimeType name="image/gif">
          <plugin id="parse-thumb" />
        </mimeType>

        <mimeType name="image/bmp">
          <plugin id="parse-thumb" />
        </mimeType>

        <mimeType name="image/ico">
          <plugin id="parse-thumb" />
        </mimeType>



       <!-- Types for parse-ext plugin: required for unit tests to pass. -->

        <mimeType name="application/vnd.nutch.example.cat">
                <plugin id="parse-ext" />
        </mimeType>

        <mimeType name="application/vnd.nutch.example.md5sum">
                <plugin id="parse-ext" />
        </mimeType>

        <!--  alias mappings for parse-xxx names to the actual extension 
implementation
        ids described in each plugin's plugin.xml file -->
        <aliases>
                <alias name="parse-thumb"
                        extension-id="ImageThumbnailParser" />
                <alias name="parse-tika"
                        extension-id="org.apache.nutch.parse.tika.TikaParser" />
                <alias name="parse-ext" extension-id="ExtParser" />
                <alias name="parse-html"
                        extension-id="org.apache.nutch.parse.html.HtmlParser" />
                <alias name="parse-js" extension-id="JSParser" />
                <alias name="feed"
                        extension-id="org.apache.nutch.parse.feed.FeedParser" />
                <alias name="parse-swf"
                        extension-id="org.apache.nutch.parse.swf.SWFParser" />
                <alias name="parse-zip"
                        extension-id="org.apache.nutch.parse.zip.ZipParser" />
        </aliases>

</parse-plugins>
********************************************************





http://lucene.472066.n3.nabble.com/Multiple-parsers-td3806721.html

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Reply via email to