Hi Julien:
Thanks for your reply, I though that would be a better way to implement this,
but it's working!, right know I put my code inside an if block, this is what I
use to detect the mime type:
if (content.getContentType().contains("image")) {...}
Is this a bullet proof way of accomplish this? or there is a "better" way.
Greetings!
----- Mensaje original -----
De: "Julien Nioche" <[email protected]>
Para: [email protected]
Enviados: Lunes, 12 de Noviembre 2012 15:35:18
Asunto: Re: How to restrict a plugin to some specific mimetype
Hi
I've a plugin to generate a thumbnail from images and store this in solr.
> From a previous thread Julien recommended that this plugin should be
> rewrited as a HtmlParseFilter, and with this tika could extract the usual
> metadata from the image, and my custom plugin would generate the thumbnail
> in addition to all other metadata. So far so good, this works just fine.
>
Great
>
> But how can I configure nutch that my plugin nay get the images files,
> because right now the plugin try to generate a thumbnail for every HTML
> page crawled by nutch.
>
> I've this in my parse-plugins.xml
>
> <mimeType name="image/png">
> <plugin id="parse-thumb" />
> </mimeType>
>
> <mimeType name="image/jpg">
> <plugin id="parse-thumb" />
> </mimeType>
>
well that won;t prevent other mimetypes to go through Tika then your parser
>
> And in the plugin.xml inside my plugin's folder:
>
> <implementation id="ImageThumbnailParser"
>
> class="org.apache.nutch.parse.thumbnail.ImageThumbnailParser"/>
> <parameter name="contentType"
> value="image/png|image/jpeg|image/jpg|image/gif|image/ico|image/bmp"/>
> <parameter name="pathSuffix" value=""/>
>
> What I'm missing?
>
Simply add some code in your parser to get the mimetype of the current doc
and skip it if it does not match what you want.
HTH
--
*
*Open Source Solutions for Text Engineering
http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci