Hi all:
I've a plugin to generate a thumbnail from images and store this in solr. From
a previous thread Julien recommended that this plugin should be rewrited as a
HtmlParseFilter, and with this tika could extract the usual metadata from the
image, and my custom plugin would generate the thumbnail in addition to all
other metadata. So far so good, this works just fine.
But how can I configure nutch that my plugin nay get the images files, because
right now the plugin try to generate a thumbnail for every HTML page crawled by
nutch.
I've this in my parse-plugins.xml
<mimeType name="image/png">
<plugin id="parse-thumb" />
</mimeType>
<mimeType name="image/jpg">
<plugin id="parse-thumb" />
</mimeType>
And in the plugin.xml inside my plugin's folder:
<implementation id="ImageThumbnailParser"
class="org.apache.nutch.parse.thumbnail.ImageThumbnailParser"/>
<parameter name="contentType"
value="image/png|image/jpeg|image/jpg|image/gif|image/ico|image/bmp"/>
<parameter name="pathSuffix" value=""/>
What I'm missing?
Greetings in advance
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci