Hi all:

I've a plugin to generate a thumbnail from images and store this in solr. From 
a previous thread Julien recommended that this plugin should be rewrited as a 
HtmlParseFilter, and with this tika could extract the usual metadata from the 
image, and my custom plugin would generate the thumbnail in addition to all 
other metadata. So far so good, this works just fine. 

But how can I configure nutch that my plugin nay get the images files, because 
right now the plugin try to generate a thumbnail for every HTML page crawled by 
nutch.

I've this in my parse-plugins.xml

        <mimeType name="image/png">
          <plugin id="parse-thumb" />
        </mimeType>

        <mimeType name="image/jpg">
          <plugin id="parse-thumb" />
        </mimeType>

And in the plugin.xml inside my plugin's folder:

      <implementation id="ImageThumbnailParser"
                      
class="org.apache.nutch.parse.thumbnail.ImageThumbnailParser"/>
                      <parameter name="contentType" 
value="image/png|image/jpeg|image/jpg|image/gif|image/ico|image/bmp"/>
                      <parameter name="pathSuffix"  value=""/>

What I'm missing?

Greetings in advance
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Reply via email to