Hi all. Sorry for my bad english. please any body can answer to me about how to allow multiple parser for any documents ?. im using nutch 1.5.1 and solr 3.6.
I have developed a custom parser plugin to parse image documents, but i need tika parsers for image to, so how can I set multiple parser for any document ? I have read this thread http://lucene.472066.n3.nabble.com/Multiple-parsers-td3806721.html but it say that it is only posible for html documents, is that true or there is another way. I need allow multiple parser for png,jpg,jpeg,gif, because i need extract some fields using tika and other with my custom parse plugins, this is my parse-plugins.xml file, now nutch is ignoring parse-tika for image mime type and this is incorrect for my purpose, please some advice or suggestion will be accepted. ******************************************************** <mimeType name="*"> <plugin id="parse-tika" /> </mimeType> <mimeType name="image/png"> <plugin id="parse-thumb" /> </mimeType> <mimeType name="image/jpg"> <plugin id="parse-thumb" /> </mimeType> <mimeType name="image/jpeg"> <plugin id="parse-thumb" /> </mimeType> <mimeType name="image/gif"> <plugin id="parse-thumb" /> </mimeType> <mimeType name="image/bmp"> <plugin id="parse-thumb" /> </mimeType> <mimeType name="image/ico"> <plugin id="parse-thumb" /> </mimeType> <!-- Types for parse-ext plugin: required for unit tests to pass. --> <mimeType name="application/vnd.nutch.example.cat"> <plugin id="parse-ext" /> </mimeType> <mimeType name="application/vnd.nutch.example.md5sum"> <plugin id="parse-ext" /> </mimeType> <!-- alias mappings for parse-xxx names to the actual extension implementation ids described in each plugin's plugin.xml file --> <aliases> <alias name="parse-thumb" extension-id="ImageThumbnailParser" /> <alias name="parse-tika" extension-id="org.apache.nutch.parse.tika.TikaParser" /> <alias name="parse-ext" extension-id="ExtParser" /> <alias name="parse-html" extension-id="org.apache.nutch.parse.html.HtmlParser" /> <alias name="parse-js" extension-id="JSParser" /> <alias name="feed" extension-id="org.apache.nutch.parse.feed.FeedParser" /> <alias name="parse-swf" extension-id="org.apache.nutch.parse.swf.SWFParser" /> <alias name="parse-zip" extension-id="org.apache.nutch.parse.zip.ZipParser" /> </aliases> </parse-plugins> ******************************************************** 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci

