Hi Julien:

Thanks for your reply, I though that would be a better way to implement this, 
but it's working!, right know I put my code inside an if block, this is what I 
use to detect the mime type:

if (content.getContentType().contains("image")) {...}

Is this a bullet proof way of accomplish this? or there is a "better" way.

Greetings!


----- Mensaje original -----
De: "Julien Nioche" <[email protected]>
Para: [email protected]
Enviados: Lunes, 12 de Noviembre 2012 15:35:18
Asunto: Re: How to restrict a plugin to some specific mimetype

Hi


I've a plugin to generate a thumbnail from images and store this in solr.
> From a previous thread Julien recommended that this plugin should be
> rewrited as a HtmlParseFilter, and with this tika could extract the usual
> metadata from the image, and my custom plugin would generate the thumbnail
> in addition to all other metadata. So far so good, this works just fine.
>

Great


>
> But how can I configure nutch that my plugin nay get the images files,
> because right now the plugin try to generate a thumbnail for every HTML
> page crawled by nutch.
>
> I've this in my parse-plugins.xml
>
>         <mimeType name="image/png">
>           <plugin id="parse-thumb" />
>         </mimeType>
>
>         <mimeType name="image/jpg">
>           <plugin id="parse-thumb" />
>         </mimeType>
>

well that won;t prevent other mimetypes to go through Tika then your parser


>
> And in the plugin.xml inside my plugin's folder:
>
>       <implementation id="ImageThumbnailParser"
>
> class="org.apache.nutch.parse.thumbnail.ImageThumbnailParser"/>
>                       <parameter name="contentType"
> value="image/png|image/jpeg|image/jpg|image/gif|image/ico|image/bmp"/>
>                       <parameter name="pathSuffix"  value=""/>
>
> What I'm missing?
>

Simply add some code in your parser to get the mimetype of the current doc
and skip it if it does not match what you want.

HTH

-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Reply via email to