Thank a lot Julien for your answer, was very usefull for me.
I have readed this tutorial and was very usefull to
http://sujitpal.blogspot.com/2009/07/nutch-custom-plugin-to-parse-and-add.ht
ml 






Sorry for my bad english.
> please any body can answer to me about how to allow multiple parser 
> for any documents ?. im using nutch 1.5.1 and solr 3.6.
>
> I have developed a custom parser plugin to parse image documents, but 
> i need tika parsers for image to, so how can I set multiple parser for 
> any document ?
> I have read this thread
> http://lucene.472066.n3.nabble.com/Multiple-parsers-td3806721.html
> but it say that it is only posible for html documents, is that true or 
> there is another way.


re-read the thread and you'll get the answer to your question


> I need allow multiple parser for png,jpg,jpeg,gif, because i need 
> extract some fields using tika and other with my custom parse plugins, 
> this is my parse-plugins.xml file, now nutch is ignoring parse-tika 
> for image mime type and this is incorrect for my purpose, please some 
> advice or suggestion will be accepted.


no need to modify parse-plugins.xml - simply rewrite parse-thumb as a
HtmlParseFilter and you'll get both the Tika metadata and your own. The
plugin dir contains several examples of such filters. Search the mailing
list archives or Wiki for explanations on Parser vs HTMLParseFilter as well
as https://issues.apache.org/jira/browse/NUTCH-1482

HTH

Julien
--
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Reply via email to