Re: will nutch-2 be able to index image files

Andrzej Bialecki Wed, 09 Mar 2011 00:24:45 -0800

On 3/8/11 10:50 PM, [email protected] wrote:

I meant to extract image title, src link and alt from<img tags and not store 
image files. For a keyword search in must display link, which automatically 
displays image itself in the search page.
Not sure what do you mean image content-based retrieval? Do image files have 
tags like mp3 ones?


Yes, for example http://en.wikipedia.org/wiki/Exchangeable_image_file_format

Must  a parse plugin be written in both cases?

Yes - most data is already available either in the DOM tree, or can beobtained from a Tika image parser, it just needs to be wrapped in a plugin.



--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: will nutch-2 be able to index image files

Reply via email to